Thursday 10 December 2009

libumem - Memory Debugger, Memory Corruption and Memory Leaks

I am currently entangled with a particularly nasty memory corruption on my application. So I am using the libumem memory allocation utility to try and track it down. Since I am not an expert in this area I am going to write this blog in the hope that it helps me retain some information as well as proving useful to others out there.

The first thing I found to be of use was the following blog on sun.com. This gave a decent overview of memory buffer structures.

http://blogs.sun.com/amith/entry/detecting_memory_corruption_with_libumem

My problem manifested itself as a core dump in my application, written in C. I had turned the libumem utility to help me analyse the problem. Here are the steps I followed:-

1) Ran my test.

2) Generated a core file for analysis
gcore

3) Opened the core file for analysis
mdb

4) Ran the ::umem_verify utility and got the following (output here is truncated for clarity):-

umem_alloc_80 72a708 clean
umem_alloc_96 72aa88 1 corrupt buffer
umem_alloc_112 72ae08 clean

5) so I looked a little closer at the corrupted address :-

> 72aa88::umem_verify
Summary for cache 'umem_alloc_96'
buffer fcbe40 (free) seems corrupted, at 0

6) And so I looked event further into the buffer address provided by this command.

7) fcbe40::bufctl_audit

ADDR BUFADDR TIMESTAMP THR LASTLOG CONTENTS CACHE SLAB NEXT DEPTH
00fcbe40 deadbeef deadbeef00000000 deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
0xdeadbeef
libACE.so.5.4.0`__1cGACE_OSEexit6Fi_v_+0x2

8) The following command was then used to print out 10 4-byte hex values starting at the memory address 100197f60 with X setting the format to hex
> fcbe40/24X
0xfcbe40: deadbeef deadbeef deadbeef deadbeef deadbeef 0 deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef

Ok, the deadbeef suggests that the memory area has been freed but the '0' which stands out like a sore thumb here suggests that a '0' has been allocated to this area of memory after it has been.

We can ignore the first two 'deadbeefs' here as that represents the metadata part of the memory buffer.

9) What I then did was print out the umalog to see what actions had been carried out on this buffer. The first line here will redirect output to a file (mdb.log)

> ::log -e mdb.log
> ::umalog

10) I then grepped the file for the buffer fcbe40. This actually showed me where the area of memory was last freed. I looked at the structure which was being freed. Each field in the structure was 4 bytes long. So I looked at the fourth field (1 - deadbeef, 2 - deadbeef, 3 - deadbeef, 4 - 0) and then looked at my code to see where it was being used.

Lo and behold, I found that the field was being set after the memory was being freed.

No comments: