hug
hug

Reputation: 295

Read several PDF files into R with pdf_text

I have several PDF files in my directory. I have downloaded them previously, no big deal so far.

I want to read all those files in R. My idea was to use the "pdf_text" function from the "pdftools" package and write a formula like this:

mypdftext <- pdf_text(files)

Where "files" is an object that gathers all the PDF file names, so that I don't have to write manually all the names. Because I have actually downlaoded a lot of files, it would avoid me to write:

mypdftext <- pdf_text("file1.pdf", "file2.pdf", and many more files...)

To create the object "pdflist", I used "files <- list.files (pattern = "pdf$")" The “files” vector contains all the PDF file names.

But "files" does not work with pdf_text function, probably because it's a vector. What can I do instead?

Upvotes: 2

Views: 4658

Answers (2)

Tomas -
Tomas -

Reputation: 91

maybe this is not the best solution but this works for me:

library(pdftools)

# Set your path here.
your_path = 'C:/Users/.../pdf_folder'
setwd(your_path)
getwd()



lf = list.files(path=getwd(), pattern=NULL, all.files=FALSE,
           full.names=FALSE)


#Creating a list to iterate 
my_pdfs = {}

#Iterate. Asssign each element of list files, to a list. 
for (i in 1:length(lf)){my_pdfs[i] <- pdf_text(lf[i])}

#Calling the first pdf of the list.
my_pdfs[1] 

Then you can assign each of the pdfs to a single file of whatever you want. Of course, each file will be saved in each element of the list. Does this solve your problem?

Upvotes: 3

You could try using lapply over the vector that contains the location of every pdf file (files). I would recommend using list.files(..., full.names = T) to get the complete location of each pdf file. This should work.

mypdfs<-lapply(files, pdf_text)

Upvotes: 1

Related Questions