programminglearner
programminglearner

Reputation: 542

Python Regex Matching by Underscores and Strings

I have strings of the format:

Between the first and second underscore, the text is either "red" or "blue" and between the second underscore and first pair of double underscores, the text is either "one" or "two". Between the first set of double underscores is a Name. This can include a single first name or a first name and last name separated by a single underscore. This Name section is defined by the double underscores surrounding and any single underscore there means that it is part of Name. (note, the first letter of Name must be CAPS). Between the next set of double underscores is a nickname. Similarly, nicknames can be multiple words but separated by a single underscore. Anything detected between the second set of double underscores will be taken as the nickname. The remaining following the third double underscores can be anything. If multiple words are needed, they can be separated with single underscore. There doesn't have to be a remaining portion of the string.

Here is what I have so far for my regex :

always_(?:red|blue)_(?:one|two)__[A-Z]{1,1}....

I don't want to use \w+ to check for the name using underscores because this will also match the double underscores following the Name. I'm stuck where to go from here.

To clarify further, I want to catch any strings that are not following that format.

Upvotes: 1

Views: 982

Answers (3)

Patrick Haugh
Patrick Haugh

Reputation: 61014

I came up with

always_(red|blue)_(one|two)__((?:[A-Z][a-z]+_?)+)__((?:_?[a-z]+)+)(?:__(\w+))?

which works for the examples here, you might want to do some more testing

Upvotes: 1

anubhava
anubhava

Reputation: 785276

You may use this regex that follows all the rules defined in your question:

^always_(red|blue)_(one|two)__([A-Z][a-zA-Z]*(?:_[A-Z][a-zA-Z]*)?)__([a-zA-Z]+(?:_[a-zA-Z ]+)*)(?:__|$)

RegEx Demo

Upvotes: 1

Daweo
Daweo

Reputation: 36550

Are you limited solely to re? If not I think, that this task become easier after you split your string at __. I would do:

s = "always_red_one__Darrel_Jack__jackie__enter_anything_here"
parts = s.split("__")
print(parts)

Output:

['always_red_one', 'Darrel_Jack', 'jackie', 'enter_anything_here']

Then you might use always_(?:red|blue)_(?:one|two) to check if parts[0] is ok, parts[1][0].isupper() to check if second part starts with uppercase and len(parts)==4 to check if there is correct number of parts.

Upvotes: 0

Related Questions