Reputation: 1677
I am certainly not very experienced with regex but I've spent quite a while on this and I usually figure it out by now. I'm guessing someone else will have the answer right away, as my goal is very simple:
I need a simple regex to brute-singularize words (i.e remove -es or -s from the ends). The syntax for this is easy. What gets complicated is restricting it to words that are longer than 3 characters so that "US" doesn't become "U".
Here is what I am testing with:
childrens horses horse bobs us
which should match like so:
childrens horses horse bobs us
This is being done in a POSIX environment (Postgres) so that is also a bit restrictive.
Upvotes: 0
Views: 137
Reputation: 8978
If I understood you correctly, this should work:
(?<=\w{3})(s|es)\b /i
Be advised that last /i
is not part of regex, it's just case-insensitive flag. You also may want to add g
and m
flags to read entire string. Here is the breakdown:
(?<=\w{3})
- positive lookbehind
, checking that there are 3 characters preceding the following pattern(s|es)
- a capture group, looking for characters s
or es
\b
- checking that end of a word follows right after the pattern.Also be advised that this pattern does not differentiate words which ends with s
in singular form (like proteus), and i'm very doubtful this task can be properly done by regular expression only.
Upvotes: 1