user127161
user127161

Reputation: 41

Visit all subdirectories and extract first page from every pdf

I have a few folders with E-Books and I want to extract first page from every book. There are over two hundred books so doing this manually it's a big pain in the back and will be very time consuming.

I have a command that does the job for single file

pdftk TehInput.pdf cat 1 output cover_TehInput.pdf

How do I wrap this into a single script that visits everything and assigns the name to output like cover_wtv-original-name-is.pdf? All the output files might be everywhere like in the directory where script was started or near the original file.

Upvotes: 0

Views: 165

Answers (2)

Cyrus
Cyrus

Reputation: 88731

If you use no blanks or newlines in filenames:

find . -iname '*.pdf' -printf "%h %f\n" | sed -E 's|(.*) (.*)|echo pdftk \1/\2 cat 1 output \1/cover_\2|' | sh

If output is okay, remove "echo ".

Upvotes: 0

chiastic-security
chiastic-security

Reputation: 20520

You want to use the find command for this. Something like:

find . -iname '*.pdf' -exec pdftk '{}' cat 1 output '{}'.cover.pdf ';'

This will find all PDFs from the current directory (.) downwards, and execute

pdftk filename.pdf cat 1 output filename.pdf.cover.pdf

on it. It's the whole path that will get passed to pdftk, so you'll end up with the cover PDFs in the same directory as the original files. (You could do something to get rid of the .pdf.cover.pdf extensions if you need to.)

Upvotes: 1

Related Questions