makansij
makansij

Reputation: 9885

Regex one-liner for matching only what comes after a certain word?

I want to extract song names from a list like this: 'some text here, songs: song1, song2, song3, fro: othenkl' and get ['song1', 'song2', 'song3']. So I try to do it in one regex:

result =  re.findall('[Ss]ongs?:?.*', 'songs: songname1, songname2,')
print re.findall('(?:(\w+),)*', result[0])

This matches perfectly: ['', '', '', '', '', '', '', 'songname1', '', 'songname2', ''] (except for the empty strings, but nbd.

But I want to do it in one line, so I do the following:

print re.findall('[Ss]ongs?:?(?:(\w+),)*','songs: songname1, songname2,')

But I do not understand why this is unable to capture the same as the two regexes above:

['', 'name1', 'name2']

Is there a way to accomplish this in one line? It would be useful to be concise here. thanks.

Upvotes: 5

Views: 4917

Answers (2)

Kasravnd
Kasravnd

Reputation: 107347

You don't need to use re.findall in this case, you better to use re.search to find the sequence of songs then split the result with comma ,. Also you don't need to use character class [Ss] to match the Capitals you can use Ignore case flag (re.I) :

>>> s ='some text here, songs: song1, song2, song3, fro: othenkl'
>>> re.search(r'(?<=songs:)(.+),', s,flags=re.I).group(1).split(',')
[' song1', ' song2', ' song3']

(?<=songs:) is a positive look behind which will makes your regex engine match the strings precede by songs: and (.+), will match the largest string after songs: which follows by comma that is the sequence of your songs.

Also as a more general way instead of specifying comma at the end of your regex you can capture the song names based on this fact that they are followed by this patter \s\w+:.

>>> re.search(r'(?<=songs:)(.+)(?=\s\w+:)', s).group(1).split(',')
[' song1', ' song2', ' song3', '']

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

No, you can't do it in one pattern with the re module. What you can do is to use the regex module instead with this pattern:

regex.findall(r'(?:\G(?!\A), |\msongs: )(\w++)(?!:)', s)

Where \G is the position after the previous match, \A the start of the string, \m a word boundary followed by word characters, and ++ a possessive quantifier.

Upvotes: 2

Related Questions