Reputation: 1001
I have cropped a PDF using the following command.
gswin32c.exe ^
-o cropped.pdf ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [64 418 348 803] /PAGE pdfmark" ^
-f original.pdf
The PDF is getting cropped. I used the following command to extract the text from the cropped PDF.
gswin32c.exe ^
-q ^
-sFONTPATH=c:/windows/fonts ^
-dNODISPLAY ^
-dSAFER ^
-dDELAYBIND ^
-dWRITESYSTEMDICT ^
-dSIMPLE ^
-f ps2ascii.ps ^
-dFirstPage=1 ^
-dLastPage=1 ^
cropped.pdf ^
-> c:\output.txt ^
-dQUIET
The output contains the text of the original PDF and not the cropped PDF.
Can someone help to extract the text only from the cropped PDF.
Thanks Nazeer
Upvotes: 2
Views: 1714
Reputation: 90213
You may be more lucky, if you try a different means to convert the cropped.pdf to text:
Open it in Acrobat/Adobe Reader.
Click 'File --> Save as Text...'
Upvotes: 0
Reputation: 90213
The result you got is exactly what is to be expected.
Cropping of a PDF page does NOT mean: cut off everything around the cropped area and delete it.
Cropping means: do only display what's inside the cropped area (and zoom to it), and hide what's around it.
So when you convert such a page to text, you'll also get the hidden content back.
Upvotes: 2