Reputation: 85
I am trying to extract all formats of images from pdf. I did some googling and found this page on StackOverflow. I tried this code but I am getting this error:
I am using python 3.x and here is the code I am using. I tried to go through comments but couldn't figure out. Please help me resolve this.
Here is the sample PDF.
import PyPDF2
from PIL import Image
if __name__ == '__main__':
input1 = PyPDF2.PdfFileReader(open("Aadhaar1.pdf", "rb"))
page0 = input1.getPage(0)
xObject = page0['/Resources']['/XObject'].getObject()
for obj in xObject:
if xObject[obj]['/Subtype'] == '/Image':
size = (xObject[obj]['/Width'], xObject[obj]['/Height'])
data = xObject[obj].getData()
if xObject[obj]['/ColorSpace'] == '/DeviceRGB':
mode = "RGB"
else:
mode = "P"
if xObject[obj]['/Filter'] == '/FlateDecode':
img = Image.frombytes(mode, size, data)
img.save(obj[1:] + ".png")
elif xObject[obj]['/Filter'] == '/DCTDecode':
img = open(obj[1:] + ".jpg", "wb")
img.write(data)
img.close()
elif xObject[obj]['/Filter'] == '/JPXDecode':
img = open(obj[1:] + ".jp2", "wb")
img.write(data)
img.close()
I was reading some comments and going through links and found this problem solved on this page. Can someone please help me implement it?
Upvotes: 4
Views: 3576
Reputation: 29
Same error for me with Python 3.9 and PyPDF2 1.26 at time of this writing.
data = xObject[obj].getData()
was the problem line. My PDF had JPG images, and that line was not working because of same NotImlemented exception. Changing the line for the /DCTDecode part to;
data = xObject[obj]._data
kind of worked for me. This gives plain JPG stream in the pdf. So ie separate data = ... lines for each if/filter section, though not tried the JP2 part.
Upvotes: 0
Reputation: 127
As of today, I'm still getting the error NotImplementedError: unsupported filter /DCTDecode
I've PyPDF2 v 1.26.0 installed, using Python3 3.7.5. My Python code is the same as above.
Is there a solution yet?
Upvotes: 0
Reputation: 38
It is the PyPDF2
library error. Try uninstalling and installing the library with changes or you can see the changes in the GitHub and mark the changes.I hope that will work.
Upvotes: 1