Reputation: 53873
I'm building a system which handles pdf file data (for which I use the PyPDF2 lib). I now obtain a base64 encoded PDF which I can decode and store correctly using the following:
import base64
# base64FileData <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'w') as theFile:
theFile.write(fileData)
I now want to use this fileData
as a binary file to split it up, but when I do type(fileData)
, the fileData
turns out to be a <type 'str'>
. How can I convert this fileData
to be a binary (or at least not a string)?
All tips are welcome!
[EDIT]
if I do open(fileData, 'rb')
I get an error, saying
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
To remove the null bytes I tried, fileData.rstrip(' \t\r\n\0')
and fileData.rstrip('\0')
and fileData.partition(b'\0')[0]
, but nothing seems to work. Any ideas?
[EDIT2]
The thing is that I pass this string to the PyPDF2 PdfFileReader class, which on lines 909 to 912 does the following (in which stream
is the fileData
I provide):
if type(stream) in (string_type, str):
fileobj = open(stream, 'rb')
stream = BytesIO(b_(fileobj.read()))
fileobj.close()
So because its a string, it assumes it is a filename, after which it tries to open the file. This then fails with a TypeError
. So before feeding the fileData
to the PdfFileReader I need to somehow convert it to something else than str
so that it doesn't try to open it, but just considers fileData
a file on itself. Any ideas?
Upvotes: 1
Views: 9687
Reputation: 51
Example your input data is came from this:
with open(local_image_path, "rb") as imageFile:
str_image_data = base64.b64encode(imageFile.read())
then to get the binary in variable you can try:
import io
import base64
binary_image_data = io.BytesIO(base64.decodebytes(str_image_data))
Upvotes: 2
Reputation: 81
Hence the open's binary mode you have to use 'wb' else it gets saved as "text" basically.
import base64
# base64FileData <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'wb') as theFile:
theFile.write(fileData)
Upvotes: 3