benwad
benwad

Reputation: 6594

Pointer arithmetic in LLDB Python scripts

I've been trying to create a custom data formatter for a custom string type in Xcode. The following code gets me the address of the first character in the string:

def MyStringSummary(valobj, internal_dict):
    data_pointer = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('Data')
    print data_pointer.GetValue()

That prints out the pointer address. When I look at the contents of that address I can see the wide chars used to store that data, so I guess what I have to do is cast this pointer to wchar_t and then I've got the first character. One of my first approaches was this:

if data_pointer.TypeIsPointerType():
    mychar = data_pointer.Dereference()
    print mychar.GetValue()
else:
    print "data_pointer is not a pointer!"

This confirmed that the data_pointer is a pointer, but the Dereference() call doesn't seem to resolve anything: mychar.GetValue() just returns None. Another issue - would I then be able to go through a loop and increase the address of data_pointer by a fixed amount each time and keep dereferencing and finding the next character, then adding it to the output string? If so, how would I do this?

EDIT:

To help clarify the problem, I'll post some info about the underlying data structure of the string. The definition is too long to post here (also it inherits most of what it does from a generic array base class) but I'll give some more details.

When looking at the StringVar.AllocationInstance.Data pointer location I can see that we're using 16 bits for each character. All of the characters in the string I'm looking at are only 8 bits, with another 8 bits of 0 after each character. So, this is what happens when I do this in the debugger:

(lldb) p (char*)(StringVar.AllocatorInstance.Data)
(char *) $4 = 0x10653360 "P"
(lldb) p (char*)(StringVar.AllocatorInstance.Data)+1
(char *) $6 = 0x10653361 ""
(lldb) p (char*)(StringVar.AllocatorInstance.Data)+2
(char *) $7 = 0x10653362 "a"

So I assume the reason it's only showing one character at a time is because it thinks each 8-bit character is null-terminated by the following 8 bits. However, when I cast to unsigned short I get this:

(lldb) p (unsigned short*)(StringVar.AllocatorInstance.Data)
(unsigned short *) $9 = 0x10653360
(lldb) p *(unsigned short*)(StringVar.AllocatorInstance.Data)
(wchar_t) $10 = 80
(lldb) p (char*)(unsigned short*)(StringVar.AllocatorInstance.Data)
(char *) $11 = 0x10653360 "P"
(lldb) p (char*)((unsigned short*)(StringVar.AllocatorInstance.Data)+1)
(char *) $14 = 0x10653362 "a"
(lldb) p (char*)((unsigned short*)(StringVar.AllocatorInstance.Data)+2)
(char *) $18 = 0x10653364 "r"

...so it looks like the cast to unsigned short is fine, as long as we cast each integer to a char. Any idea how I might try to put this in a Python data formatter?

Upvotes: 3

Views: 1733

Answers (2)

Jason Molenda
Jason Molenda

Reputation: 15395

Your Data looks like it is probably UTF-16. I made a quick C program that looks kind of like your question description and played around a little in the interactive Python interpreter. I think this might be enough to point you in the right direction for writing your own formatter?

int main ()
{
    struct String *mystr = AllocateString();
    mystr->AllocatorInstance.len = 10;
    mystr->AllocatorInstance.Data = (void *) malloc (10);
    memset (mystr->AllocatorInstance.Data, 0, 10);
    ((char *)mystr->AllocatorInstance.Data)[0] = 'h';
    ((char *)mystr->AllocatorInstance.Data)[2] = 'e';
    ((char *)mystr->AllocatorInstance.Data)[4] = 'l';
    ((char *)mystr->AllocatorInstance.Data)[6] = 'l';
    ((char *)mystr->AllocatorInstance.Data)[8] = 'o';

    FreeString (mystr);
}

Using the lldb.frame, lldb.process shortcuts (only valid when doing interactive script), we can read the Data into a python string buffer easily:

>>> valobj = lldb.frame.FindVariable("mystr")
>>> address = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('Data').GetValueAsUnsigned()
>>> size = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('len').GetValueAsUnsigned()
>>> print address
4296016096
>>> print size
10
>>> err = lldb.SBError()
>>> print err
error: <NULL>
>>> membuf = lldb.process.ReadMemory (address, size, err)
>>> print err
success
>>> membuf
'h\x00e\x00l\x00l\x00o\x00'

From this point you can do any of the usual python array type things -

>>> for b in membuf:
...   print ord(b)
... 
104
0
101
0
108
0
108
0
111
0

I'm not sure how you can tell Python that this is UTF-16 and should be internalized correctly as wide-chars, that's more a Python question than lldb question -- but I think your best bet here is to not use the SBValue methods (because your Data pointer has an uninformative type like void *, like I did in my test program), but to use the SBProcess memory read method.

Upvotes: 5

Enrico Granata
Enrico Granata

Reputation: 3329

Without any source code references, this issue is a little harder to figure out than it should be.

With that said, my first bet is going to be that your Char* type is an “opaque” reference, so when you go to dereference it LLDB knows nothing about the pointee type and can’t resolve it. Or maybe the pointee type is not a basic type (int, char, float, …) and as such does not have a value (values are essentially a scalar property, a structure or a class or a union do not have values, they have members)

Can you publish the definition of your string type?

Working from there, there are a couple ways to extract a chunk of data from a memory location. Is your string ASCII/UTF8 encoded? If so, you could just use Process.ReadCStringFromMemory giving it the value of the pointer. That would read until the first 0 terminator is found, or until a certain maximum length is reached (you want that to avoid reading unbounded amounts of data from garbled memory)

If that is not the case, there are other approaches.

Again, the more information you can provide about the internals of your data structure, the easier it gets to write a formatter for it.

Upvotes: 1

Related Questions