Reputation: 51
I got this text
possono
godere
di la spiaggia, situato a 7 km da il porto turistico di A , a 5 chilometri da l'aeroporto di
B.
ALBERGO: formato da monolocali, appartamenti con
And I need something like this with preg_replace
possono godere di la spiaggia, situato a 7 km da il porto turistico di A, a 5 chilometri da l'aeroporto di B.
ALBERGO: formato da monolocali, appartamenti con
I use regular expressions like '/[^\.]\n/'
but it takes the space after 'B.' too.
Upvotes: 1
Views: 67
Reputation: 627082
Use
$str = 'possono
godere
di la spiaggia, situato a 7 km da il porto turistico di A , a 5 chilometri da l\'aeroporto di
B.
ALBERGO: formato da monolocali, appartamenti con';
$res = preg_replace('~\s+(?!^[A-Z]+:)~um', ' ', $str);
echo $res;
See the PHP demo
The \s+(?!^[A-Z]+:)
matches:
\s+
- 1 or more whitespaces that are not immediately followed with...(?!^[A-Z]+:)
- start of line (^
, m
modifier makes ^
match the beginning of a line instead of a string), 1+ uppercase ASCII letters (see [A-Z]+
) and a :
.The /u
modifier is used just in case the strings contain Unicode letters. Also, in that case, replace [A-Z]
with \p{Lu}
.
Upvotes: 1
Reputation: 1458
I think this process should be split up into more tasks. My proposal:
Tidy up all whitespace sequences (\s+)
and normalize them to one standard space (remember to set the "global" flag).
Restructure the text by identifying semantic markers like "ALBERGO: "
and place a line feed \n
before it. You could even search for ". ALBERGO: "
and replace it by ".\nALBERGO: "
Standardize (or beautify) the text by identifying singularized commas " , "
and replace them with ", "
.
Upvotes: 0