Aakash Basu
Aakash Basu

Reputation: 1767

How to extract charts/tables/graphs from PDF files using Python?

Searched quite a bit but as I couldn't find a solution for this kind of problem, hence posting a clear question on the same. Most answers cover image/text extraction which are comparatively easier.

I've a requirement of extracting tables and graphs as text (csv) and images respectively from PDFs.

Can anyone help me with an efficient python 3.6 code to solve the same?

Till now I could achieve extracting jpgs using startmark = b"\xff\xd8" and endmark = b"\xff\xd9", but not all tables and graphs in a PDF are plain jpgs, hence my code fails badly in achieving that.

Example, I want to extract table from page 11 and graphs from page 12 as image or something which is feasible from the below given link. How to go about it?

https://hartmannazurecdn.azureedge.net/media/2369/annual-report-2017.pdf

Upvotes: 7

Views: 15068

Answers (0)

Related Questions