Reputation: 1961
I have this text:
Retailer-ul Amazon foloseste metode severe pentru a-si descuraja etc. angajatii din depozite sa nu mai fure din produse. Pe ecrane li se arata siluete de angajati care au furat produse, li se spune ce au furat si cat valorau produsele, aparand si mentiunea "arestat" sau "concediat", scrie Bloomberg. Unii spun ca... and so on
and I am trying to replace all strings that are abbreviations inside a fraze, so for example etc.
is an abbreviation because it's following word angajatii
starts with a lowercase letter, as opposed to produse.
which is the end of the fraze because it's following word Pe
starts with a uppercase letter and I don't want to remove it.
I have this code $subject = preg_replace('~\b[a-z]+\.\s[a-z]~', '', $subject);
which matches every abbreviation with a .
after it and a space (\s
) and then a lowercase letter [a-z]
(eg. descuraja etc. angajatii
turns into descuraja ngajatii
instead of descuraja angajatii
). I don't want to replace the lowercase letter of its following word. I somehow can't avoid it being replaced. How can I still keep the same matching pattern but replace only the abbreviation and the dot and the whitespace after it? Thank you.
Upvotes: 1
Views: 1504
Reputation: 626845
You need to wrap the [a-z]
into a positive lookahead:
\b[a-z]+\.\s(?=[a-z])
See the regex demo
The lookahead construct just checks if some pattern defined inside it appears to the right of the current location. So, (?=[a-z])
checks if there is a lowercase ASCII letter right after the whitespace matched with \s
. If there is a lowercase, a match is returned (and the replacement occurs), if it does not find the small letter, the match is failed, no replacement occurs.
Upvotes: 5