Atlas80b
Atlas80b

Reputation: 119

Regular expression to check strings containing a set of words separated by a delimiter

As the title says, I'm trying to build up a regular expression that can recognize strings with this format:

word!!cat!!DOG!! ... Phone!!home!!

where !! is used as a delimiter. Each word must have a length between 1 and 5 characters. Empty words are not allowed, i.e. no strings like !!,!!!! etc.

A word can only contain alphabetical characters between a and z (case insensitive). After each word I expect to find the special delimiter !!.

I came up with the solution below but since I need to add other controls (e.g. words can contain spaces) I would like to know if I'm on the right way.

(([a-zA-Z]{1,5})([!]{2}))+

Also note that empty strings are not allowed, hence the use of +

Help and advices are very welcome since I just started learning how to build regular expressions. I run some tests using http://regexr.com/ and it seems to be okay but I want to be sure. Thank you!

Examples that shouldn't match:

a!!b!!aaaaaa!!
a123!!b!!c!!
aAaa!!bbb
aAaa!!bbb!

Upvotes: 1

Views: 1992

Answers (1)

ssc-hrep3
ssc-hrep3

Reputation: 16089

Splitting the string and using the values between the !!

It depends on what you want to do with the regular expression. If you want to match the values between the !!, here are two ways:

Matching with groups

([^!]+)!!
  • [^!]+ requires at least 1 character other than !
  • !! instead of [!]{2} because it is the same but much more readable

Matching with lookahead

If you only want to match the actual word (and not the two !), you can do this by using a positive lookahead:

[^!]+(?=!!)
  • (?=) is a positive lookahead. It requires everything inside, i.e. here !!, to be directly after the previous match. It however won't be in the resulting match.

Here is a live example.


Validating the string

If you however want to check the validity of the whole string, then you need something like this:

^([^!]+!!)+$
  • ^ start of the string
  • $ end of the string
  • It requires the whole string to contain only ([^!]+!!) one or more than one times.

If [^!] does not fit your requirements, you can of course replace it with [a-zA-Z] or similar.

Upvotes: 1

Related Questions