Reputation: 51
I am using pdftools to convert the pdf to excel. I want to extract the table values.The code below works perfectly but it pastes everything in rows(I mean the values are not separated in different coloumns- The only thing I want is the table as it is in the pdf(the data and values). . Can someone help with the code? Maybe we need a separator? I hope for some help! 3 images below:- Excel output I get,Expected Excel Output and the PDF I am working with.
library(pdftools)
tx<-pdf_text("Path")
tx2<-strsplit(tx,"\n")
library(xlsx)
write.xlsx(tx2,file="ds.xlsx")
Upvotes: 0
Views: 15743
Reputation: 18425
Try this...
library(pdftools)
library(stringr)
library(xlsx)
tx <- pdf_text("Path")
tx2 <- unlist(str_split(tx, "[\\r\\n]+"))
tx3 <- str_split_fixed(str_trim(tx2), "\\s{2,}", 5)
write.xlsx(tx3, file="ds.xlsx")
Upvotes: 7