Reputation: 11
I am new to this platform and I hope someone can help me.
I have imported some pdf files into Rstudio using the pdftools library. Now I want to make structured columns of this text. I just can't seem to get the structure right.
This is an example of one file added that I imported. I want to make the yellow shaded lines in a data table.
This is the outcome I would ultimately like to have.
Now I have entered the code below, but I can't get it into a data table.
library(pdftools)
library(stringr)
library(dplyr)
# load the PDF-files into Rstudio
files <- list.files(pattern = "pdf$", full.names = TRUE)
# make a list of the PDF-files
filestext <- lapply(files, pdf_text)
# remove "\n"
filestext <- str_split(filestext, pattern = "\n")
This is the result I get:
Does anyone know the easiest way to solve this?
Upvotes: 1
Views: 1005
Reputation: 130
I would also give https://sensible.so a shot. We have some great documentation and a free plan just for projects like this. Plus, when you sign up there are some tutorials to help you understand how to extract different types of data. I bet you can have this extracted into a clean JSON object in no time.
Upvotes: -3