Python read() api anomaly

Question

Hi here is a snippet from my python script:

#seek to the symtab_offset    
elf_fp.seek(self.symtab_sh.sh_offset)

#each entry is 16 bytes, so num_entries = size/16
num_entries = self.symtab_sh.sh_size/16
symbol_list = []
counter = 0
prev=0
for _ in range(num_entries):
    counter+=1
    s = struct.Struct('IIIccH' )
    prev = elf_fp.tell()
    print str(counter) +"  " +str(elf_fp.tell()) +"/" + str(hex(elf_fp.tell())),
    buffer = elf_fp.read(16)
    print " diff: " +str(elf_fp.tell() - prev)
    if len(buffer) !=16:
        continue
    unpacked_data = s.unpack(buffer)
    name          = unpacked_data[0]
    value         = unpacked_data[1]
    size          = unpacked_data[2]
    types         = unpacked_data[3]
    #print str(size) +"," +str(types.encode('hex'))
    #only add none zero size entries
    if size and name:
       symbol_list.append({"name":name,"value":value, "size": size, "type": types})

This snippet is reading 16 bytes of data from and ELF file's symbol table and trying to unpack it within a struct format. The problem I am facing is that in a big ELF file with more than 100+ symbols I could successfully decipher symbol information for first 100 symbols but last few i can't.

If I look at my log I can see that read api is acting weird. After reading 16 bytes from file it should increment file pointer by 16 bytes. Instead I can see it incrementing it by some weird offsets at some places.

Here is log snippet:

107  36056/0x8cd8L  diff: 16
108  36072/0x8ce8L  diff: 16
109  36088/0x8cf8L  diff: 16
110  36104/0x8d08L  diff: 2864
111  38968/0x9838L  diff: 16

You can see that for 110th symbol the read is causing a jump of around 2864 bytes. Any idea why read is behaving this weird? Are there known problems with python read api?

Robᵩ · Accepted Answer

You've opened the file in 'r' mode, or text mode. In order for file.tell() to provide useful information, you must open the file in 'rb' or binary mode.

Python read() api anomaly

Answers (1)

Related Questions