Alvaro Morales
Alvaro Morales

Reputation: 1925

How to get the size (not page size, but in MB) of every page in a pdf using R?

I am trying to split a pdf into multiple pdfs but all of them must be of similar size. I want the pdf split by size, not by some markers or every 10 pages or so. I understand some pages are of a bigger size because of elements like colors, figures, etc.

I tried using the function select_pages of the staplr package and a for loop to create pdf, checking size and removing it if not met the wanted size. But, this process is so slow.

I need something to fast get the size of every page of the pdf so I can split it by size.

Upvotes: 0

Views: 332

Answers (1)

If I got it right, you can achieve it by using pdftools:

# Arguments
pdf_file <- "input.pdf" # file name
thres = 2 # size in Mb

# Create temporary folder based on local time
tmp <- gsub(":|-| ", "", Sys.time())
dir.create(tmp)

# Split pages
invisible(pdf_split(pdf_file, paste0(tmp, '/page')))
# Get page files' names
pages <- list.files(tmp, full.names = T)
# Get page files' sizes
page_sizes <- sapply(pages, function(page) file.info(page)$size) / 10^6

# Remove pages with sizes bigger than a threshold
pages_ok <- pages[page_sizes <= size]

# Do whatever you wanna do (here, I'm creating a pdf with acceptable page sizes)
pdf_combine(pages_ok, output = "output.pdf")

# Remove temporary folder
unlink(tmp, recursive = T)

Upvotes: 2

Related Questions