Reputation: 9
I have tried just about every trick in the book, and the white space and tabs persist
I ran 'Recognize Text' in Adobe Acrobat to convert an image into a pdf where I could grab the text.
The text comes in, but there is a lot of what 'looks' like white space and tabs, but all the usual tricks won't get rid of them:
$ltxt1 = str_replace("\x0B","",str_replace("\0","",str_replace(",","",str_replace(" ","",str_replace("\t","",str_replace("\n","~",$text))))));
and
$ltxt = preg_replace('/\s+/', '', $ltxt1);
There is some issue related to the way I made the text renderable on the face of the pdf doc is my guess
Upvotes: 0
Views: 29
Reputation: 1126
the default function trim
should work.
If you want to be sure then
$clean_string=preg_replace('/\s/g','',$dirty_string);
The \s
flag is for All Whitespace Characters and the g
should return all matches, not the first.
Otherwise you'll need to see what is really there and look in the hex or it could even be margin or padding that's pushing everything weirdly. Are there misplaced <p>
or <span>
tags that PDF renderers love to throw in with seemingly arbitrary formatting.
Upvotes: 1