mwtmurphy
mwtmurphy

Reputation: 374

Using R to transform all pages of a PDF into text, for multiple files

I'm using a looped 'pdf_render_page' function to create a bitmap of PDF documents that are then turned into raw text via the tesseract package. However this function works only given knowledge of file size. Does anyone know a way to take a pdf with an unknown page number total and discover the page count to then run this loop?

Upvotes: 2

Views: 967

Answers (1)

mwtmurphy
mwtmurphy

Reputation: 374

when using the pdftools package you can assign the length of pdf 'dummy.pdf' by doing:

pdf_length <- pdf_info("dummy.pdf")$pages

Upvotes: 1

Related Questions