Nick
Nick

Reputation: 181

How To Extract Data From PDF In Python Using PDFrw

I am trying to use PDFrw to get data from a certain PDF (Let's say the one at the top right of the page HERE). I am using PDFrw to do this. I have looked through the documentation that they provide (I couldn't find much) and looked at the example code that they posted on git, but I can't seem to get enough information together to do what I would like to do. How would I make a simple program to go into the PDF using PDFrw (Or another if there is a better one) and extract a certain piece of text. I was thinking about converting it to html... Would that be easier? Look at the PDF I provided above as an example, I would like to get the (let's say) the voltage, which in the PDF is 600 w... How would I go about doing this in the simplest way? I couldn't find any other stack overflow questions about this, so hopefully someone can help that has used it before!

Thanks!

Upvotes: 4

Views: 3998

Answers (1)

Patrick Maupin
Patrick Maupin

Reputation: 8127

I am the author of pdfrw, and it's not really designed for this. You should probably look at pdfminer.

Upvotes: 14

Related Questions