Bunty
Bunty

Reputation: 51

R Script pdf to excel using pdftools

I am using pdftools to convert the pdf to excel. I want to extract the table values.The code below works perfectly but it pastes everything in rows(I mean the values are not separated in different coloumns- The only thing I want is the table as it is in the pdf(the data and values). . Can someone help with the code? Maybe we need a separator? I hope for some help! 3 images below:- Excel output I get,Expected Excel Output and the PDF I am working with.

library(pdftools)
tx<-pdf_text("Path")
tx2<-strsplit(tx,"\n")
library(xlsx)
write.xlsx(tx2,file="ds.xlsx")

Upvotes: 0

Views: 15743

Answers (1)

Andrew Gustar
Andrew Gustar

Reputation: 18425

Try this...

library(pdftools)
library(stringr)
library(xlsx)

tx <- pdf_text("Path")
tx2 <- unlist(str_split(tx, "[\\r\\n]+"))
tx3 <- str_split_fixed(str_trim(tx2), "\\s{2,}", 5)

write.xlsx(tx3, file="ds.xlsx")

Upvotes: 7

Related Questions