southpaw93
southpaw93

Reputation: 1961

php preg_replace match string but replace only part of it

I have this text: Retailer-ul Amazon foloseste metode severe pentru a-si descuraja etc. angajatii din depozite sa nu mai fure din produse. Pe ecrane li se arata siluete de angajati care au furat produse, li se spune ce au furat si cat valorau produsele, aparand si mentiunea "arestat" sau "concediat", scrie Bloomberg. Unii spun ca... and so on and I am trying to replace all strings that are abbreviations inside a fraze, so for example etc. is an abbreviation because it's following word angajatii starts with a lowercase letter, as opposed to produse. which is the end of the fraze because it's following word Pe starts with a uppercase letter and I don't want to remove it.

I have this code $subject = preg_replace('~\b[a-z]+\.\s[a-z]~', '', $subject); which matches every abbreviation with a . after it and a space (\s) and then a lowercase letter [a-z] (eg. descuraja etc. angajatii turns into descuraja ngajatii instead of descuraja angajatii). I don't want to replace the lowercase letter of its following word. I somehow can't avoid it being replaced. How can I still keep the same matching pattern but replace only the abbreviation and the dot and the whitespace after it? Thank you.

Upvotes: 1

Views: 1504

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You need to wrap the [a-z] into a positive lookahead:

\b[a-z]+\.\s(?=[a-z])

See the regex demo

The lookahead construct just checks if some pattern defined inside it appears to the right of the current location. So, (?=[a-z]) checks if there is a lowercase ASCII letter right after the whitespace matched with \s. If there is a lowercase, a match is returned (and the replacement occurs), if it does not find the small letter, the match is failed, no replacement occurs.

Upvotes: 5

Related Questions