add-semi-colons
add-semi-colons

Reputation: 18840

Match only the string that has strings after last underscore

I am trying to match string with underscores, throughout the string there are underscores but I want to match the strings that that has strings after the last underscore: Let me provide an example:

s = "hello_world"
s1 = "hello_world_foo"
s2 = "hello_world_foo_boo"

In my case I only want to capture s1 and s2.

I started with following, but can't really figure how I would do the match to capture strings that has strings after hello_world's underscore.

rgx = re.compile(ur'(?P<firstpart>\w+)[_]+(?P<secondpart>\w+)$', re.I | re.U)

Upvotes: 0

Views: 69

Answers (2)

Pedro Lobito
Pedro Lobito

Reputation: 99081

Try this:

reobj = re.compile("^(?P<firstpart>[a-z]+)_(?P<secondpart>[a-z]+)_(?P<lastpart>.*?)$", re.IGNORECASE)
result = reobj.findall(subject)

Regex Explanation

^(?P<firstpart>[a-z]+)_(?P<secondpart>[a-z]+)_(?P<lastpart>.*?)$

Options: case insensitive

Assert position at the beginning of the string «^»
Match the regular expression below and capture its match into backreference with name “firstpart” «(?P<firstpart>[a-z]+)»
   Match a single character in the range between “a” and “z” «[a-z]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match the regular expression below and capture its match into backreference with name “secondpart” «(?P<secondpart>[a-z]+)»
   Match a single character in the range between “a” and “z” «[a-z]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match the regular expression below and capture its match into backreference with name “lastpart” «(?P<lastpart>.*?)»
   Match any single character that is not a line break character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Upvotes: 1

Oyvindkg
Oyvindkg

Reputation: 111

If I understand what you are asking for (you want to match string with more than one underscore and following text)

rgx = re.compile(ur'(?P<firstpart>\w+)[_]+(?P<secondpart>\w+)_[^_]+$', re.I | re.U)

Upvotes: 0

Related Questions