Todd Munyon
Todd Munyon

Reputation: 1

.Thumbdata3 file extraction. TypeError: a bytes-like object is required, not 'str'

I'm aware there are similar threads and I've gone through them, but they didn't help my case:

A while ago I saved two .thumbdata3 files that are about 500mb in size each. This stackexchange thread claimed I could extract small jpegs from the files using a python script:

#!/usr/bin/python

"""extract files from Android thumbdata3 file"""

f=open('thumbdata3.dat','rb')
tdata = f.read()
f.close()

ss = '\xff\xd8'
se = '\xff\xd9'

count = 0
start = 0
while True:
    x1 = tdata.find(ss,start)
    if x1 < 0:
        break
    x2 = tdata.find(se,x1)
    jpg = tdata[x1:x2+1]
    count += 1
    fname = 'extracted%d03.jpg' % (count)
    fw = open(fname,'wb')
    fw.write(jpg)
    fw.close()
    start = x2+2

However it returned this error:

Traceback (most recent call last):
  File "... extract.py", line 15, in <module>
    x1 = tdata.find(ss,start)
TypeError: a bytes-like object is required, not 'str'

After searching around I thought the error might be between using 2.7 and 3.5 methodology, and changed the 'rb' in the f.open function to 'r' only to get this error:

Traceback (most recent call last):
  File "...\Thumbdata\thumbadata extract.py", line 6, in <module>
    tdata = f.read()
  File "...\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 277960004: character maps to <undefined>

It's worth mentioning that the script and the file are both in the same folder. I'm using Atom with a Python run package, as well as Anaconda3.

Any help is appreciated.

Upvotes: 0

Views: 1334

Answers (2)

DARK SIDE
DARK SIDE

Reputation: 31

Just keep the same code !

This error :

Traceback (most recent call last):
  File "... extract.py", line 15, in <module>
    x1 = tdata.find(ss,start)
TypeError: a bytes-like object is required, not 'str'

is due to the using of strings instead of byte-like object here :

ss = '\xff\xd8'
se = '\xff\xd9'

And to fix this problem just add a b before those strings this is the solution :

ss = b'\xff\xd8'
se = b'\xff\xd9'

Upvotes: 1

Usuario C
Usuario C

Reputation: 21

You must keep using rb mode for read binary in f=open('thumbdata3.dat','rb') to read that binary data.

The problem is that f is a binary stream then find function expect a parameter of byte type, which is new in Python3.

ss and se were assigned as string value, so its type is string (I guess ss and se stand for string start and string end).

You need to encode those strings to binary type using encode() function:

x1 = tdata.find(ss.encode(),start)

x2 = tdata.find(se.encode(),x1)

Please test and comment the output to ensure it would work.

Upvotes: 1

Related Questions