Regex one-liner for matching only what comes after a certain word?

Question

I want to extract song names from a list like this: 'some text here, songs: song1, song2, song3, fro: othenkl' and get ['song1', 'song2', 'song3']. So I try to do it in one regex:

result =  re.findall('[Ss]ongs?:?.*', 'songs: songname1, songname2,')
print re.findall('(?:(\w+),)*', result[0])

This matches perfectly: ['', '', '', '', '', '', '', 'songname1', '', 'songname2', ''] (except for the empty strings, but nbd.

But I want to do it in one line, so I do the following:

print re.findall('[Ss]ongs?:?(?:(\w+),)*','songs: songname1, songname2,')

But I do not understand why this is unable to capture the same as the two regexes above:

['', 'name1', 'name2']

Is there a way to accomplish this in one line? It would be useful to be concise here. thanks.

Kasravnd · Accepted Answer

You don't need to use re.findall in this case, you better to use re.search to find the sequence of songs then split the result with comma ,. Also you don't need to use character class [Ss] to match the Capitals you can use Ignore case flag (re.I) :

>>> s ='some text here, songs: song1, song2, song3, fro: othenkl'
>>> re.search(r'(?<=songs:)(.+),', s,flags=re.I).group(1).split(',')
[' song1', ' song2', ' song3']

(?<=songs:) is a positive look behind which will makes your regex engine match the strings precede by songs: and (.+), will match the largest string after songs: which follows by comma that is the sequence of your songs.

Also as a more general way instead of specifying comma at the end of your regex you can capture the song names based on this fact that they are followed by this patter \s\w+:.

>>> re.search(r'(?<=songs:)(.+)(?=\s\w+:)', s).group(1).split(',')
[' song1', ' song2', ' song3', '']

Regex one-liner for matching only what comes after a certain word?

Answers (2)

Related Questions