verbatim
verbatim

Reputation: 229

Match Strings Composed of Substrings in a Given Set

By searching a dictionary of words, I am trying find strings made up of substrings.

First, finding string made of letters is straight forward:
1. [abcdefgjlmnqrsz]+

Above finds any word or phrase that contains the above letters. What I am trying to figure out how to find strings made up of substrings:

So for example dictionary = ["neon","none","dog","bear","bare"]

and regex is: [ar|be|o|ne|n]+

I would like to find: neon, bear

But, regex ex. 2 is incorrect, because it finds: neon, noen, bear, bare.

Any help appreciated

Upvotes: 1

Views: 112

Answers (1)

zx81
zx81

Reputation: 41848

A Character Class Matches One Single Character

You are looking for

\b(?:ar|be|o|ne|n)+\b

Note that [things] is a character class that matches one single character. Therefore [ar|be|o|ne|n] does not mean what you thought: it means "one character that is either one of a,r,|,b,e,|,o,|,n,e,|,n

Explanation

  • \b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
  • (?: ... ) is a non-capture group
  • | is the alternation (OR) operator

Upvotes: 4

Related Questions