Reputation: 18840
I am trying to match string with underscores, throughout the string there are underscores but I want to match the strings that that has strings after the last underscore: Let me provide an example:
s = "hello_world"
s1 = "hello_world_foo"
s2 = "hello_world_foo_boo"
In my case I only want to capture s1 and s2.
I started with following, but can't really figure how I would do the match to capture strings that has strings after hello_world
's underscore.
rgx = re.compile(ur'(?P<firstpart>\w+)[_]+(?P<secondpart>\w+)$', re.I | re.U)
Upvotes: 0
Views: 69
Reputation: 99081
Try this:
reobj = re.compile("^(?P<firstpart>[a-z]+)_(?P<secondpart>[a-z]+)_(?P<lastpart>.*?)$", re.IGNORECASE)
result = reobj.findall(subject)
Regex Explanation
^(?P<firstpart>[a-z]+)_(?P<secondpart>[a-z]+)_(?P<lastpart>.*?)$
Options: case insensitive
Assert position at the beginning of the string «^»
Match the regular expression below and capture its match into backreference with name “firstpart” «(?P<firstpart>[a-z]+)»
Match a single character in the range between “a” and “z” «[a-z]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match the regular expression below and capture its match into backreference with name “secondpart” «(?P<secondpart>[a-z]+)»
Match a single character in the range between “a” and “z” «[a-z]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match the regular expression below and capture its match into backreference with name “lastpart” «(?P<lastpart>.*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Upvotes: 1
Reputation: 111
If I understand what you are asking for (you want to match string with more than one underscore and following text)
rgx = re.compile(ur'(?P<firstpart>\w+)[_]+(?P<secondpart>\w+)_[^_]+$', re.I | re.U)
Upvotes: 0