Maintaining Data Type in Colaboratory

Question

I'm trying to use PyPDF2 to read a pdf document and output a plain text string. However, when I upload my pdf file to colaboratory using the code:

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
  name=fn, length=len(uploaded[fn])))

it automatically coverts it to a str type rather than keeping it as an encoded string. This gives an error with PyPDF.PdfFileReader() but if you print the string it still has all the encoded characters:

gsutilCheatSheet.pdf => %PDF-1.5 %�� 1 0 obj <>/Metadata 117 0 R/ViewerPreferences 118 0 R>> endobj

etc.

Is there any way to keep the imported document in there original encoded format or is there another way to remove the encoding once it is already a str?

Maintaining Data Type in Colaboratory

Answers (1)

Related Questions