AMER
AMER

Reputation: 1001

Crop and extract text from PDF

I have cropped a PDF using the following command.

gswin32c.exe ^
-o cropped.pdf ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [64 418 348 803] /PAGE pdfmark" ^
-f original.pdf

The PDF is getting cropped. I used the following command to extract the text from the cropped PDF.

gswin32c.exe ^
-q ^
-sFONTPATH=c:/windows/fonts ^
-dNODISPLAY ^
-dSAFER ^
-dDELAYBIND ^
-dWRITESYSTEMDICT ^
-dSIMPLE ^
-f ps2ascii.ps ^
-dFirstPage=1 ^
-dLastPage=1 ^
cropped.pdf ^
-> c:\output.txt ^
-dQUIET 

The output contains the text of the original PDF and not the cropped PDF.

Can someone help to extract the text only from the cropped PDF.

Thanks Nazeer

Upvotes: 2

Views: 1714

Answers (2)

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90213

You may be more lucky, if you try a different means to convert the cropped.pdf to text:

Open it in Acrobat/Adobe Reader.

Click 'File --> Save as Text...'

Upvotes: 0

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90213

The result you got is exactly what is to be expected.

  • Cropping of a PDF page does NOT mean: cut off everything around the cropped area and delete it.

  • Cropping means: do only display what's inside the cropped area (and zoom to it), and hide what's around it.

So when you convert such a page to text, you'll also get the hidden content back.

Upvotes: 2

Related Questions