Alexi Theodore
Alexi Theodore

Reputation: 1677

Regex: How to find a pattern on words of a certain length

I am certainly not very experienced with regex but I've spent quite a while on this and I usually figure it out by now. I'm guessing someone else will have the answer right away, as my goal is very simple:

I need a simple regex to brute-singularize words (i.e remove -es or -s from the ends). The syntax for this is easy. What gets complicated is restricting it to words that are longer than 3 characters so that "US" doesn't become "U".

Here is what I am testing with:

childrens horses horse bobs us

which should match like so:

childrens horses horse bobs us

This is being done in a POSIX environment (Postgres) so that is also a bit restrictive.

Upvotes: 0

Views: 137

Answers (1)

Aleksandr Medvedev
Aleksandr Medvedev

Reputation: 8978

If I understood you correctly, this should work:

(?<=\w{3})(s|es)\b /i

Be advised that last /i is not part of regex, it's just case-insensitive flag. You also may want to add g and m flags to read entire string. Here is the breakdown:

  • (?<=\w{3}) - positive lookbehind, checking that there are 3 characters preceding the following pattern
  • (s|es) - a capture group, looking for characters s or es
  • \b - checking that end of a word follows right after the pattern.

Also be advised that this pattern does not differentiate words which ends with s in singular form (like proteus), and i'm very doubtful this task can be properly done by regular expression only.

Upvotes: 1

Related Questions