Reputation: 1925
I am trying to split a pdf into multiple pdfs but all of them must be of similar size. I want the pdf split by size, not by some markers or every 10 pages or so. I understand some pages are of a bigger size because of elements like colors, figures, etc.
I tried using the function select_pages of the staplr package and a for loop to create pdf, checking size and removing it if not met the wanted size. But, this process is so slow.
I need something to fast get the size of every page of the pdf so I can split it by size.
Upvotes: 0
Views: 332
Reputation: 4929
If I got it right, you can achieve it by using pdftools
:
# Arguments
pdf_file <- "input.pdf" # file name
thres = 2 # size in Mb
# Create temporary folder based on local time
tmp <- gsub(":|-| ", "", Sys.time())
dir.create(tmp)
# Split pages
invisible(pdf_split(pdf_file, paste0(tmp, '/page')))
# Get page files' names
pages <- list.files(tmp, full.names = T)
# Get page files' sizes
page_sizes <- sapply(pages, function(page) file.info(page)$size) / 10^6
# Remove pages with sizes bigger than a threshold
pages_ok <- pages[page_sizes <= size]
# Do whatever you wanna do (here, I'm creating a pdf with acceptable page sizes)
pdf_combine(pages_ok, output = "output.pdf")
# Remove temporary folder
unlink(tmp, recursive = T)
Upvotes: 2