Reputation: 9885
I want to extract song names from a list like this: 'some text here, songs: song1, song2, song3, fro: othenkl'
and get ['song1', 'song2', 'song3']
. So I try to do it in one regex:
result = re.findall('[Ss]ongs?:?.*', 'songs: songname1, songname2,')
print re.findall('(?:(\w+),)*', result[0])
This matches perfectly: ['', '', '', '', '', '', '', 'songname1', '', 'songname2', '']
(except for the empty strings, but nbd.
But I want to do it in one line, so I do the following:
print re.findall('[Ss]ongs?:?(?:(\w+),)*','songs: songname1, songname2,')
But I do not understand why this is unable to capture the same as the two regexes above:
['', 'name1', 'name2']
Is there a way to accomplish this in one line? It would be useful to be concise here. thanks.
Upvotes: 5
Views: 4917
Reputation: 107347
You don't need to use re.findall
in this case, you better to use re.search
to find the sequence of songs then split the result with comma ,
. Also you don't need to use character class [Ss]
to match the Capitals you can use Ignore case flag (re.I
) :
>>> s ='some text here, songs: song1, song2, song3, fro: othenkl'
>>> re.search(r'(?<=songs:)(.+),', s,flags=re.I).group(1).split(',')
[' song1', ' song2', ' song3']
(?<=songs:)
is a positive look behind which will makes your regex engine match the strings precede by songs:
and (.+),
will match the largest string after songs:
which follows by comma that is the sequence of your songs.
Also as a more general way instead of specifying comma at the end of your regex you can capture the song names based on this fact that they are followed by this patter \s\w+:
.
>>> re.search(r'(?<=songs:)(.+)(?=\s\w+:)', s).group(1).split(',')
[' song1', ' song2', ' song3', '']
Upvotes: 2
Reputation: 89639
No, you can't do it in one pattern with the re module. What you can do is to use the regex module instead with this pattern:
regex.findall(r'(?:\G(?!\A), |\msongs: )(\w++)(?!:)', s)
Where \G
is the position after the previous match, \A
the start of the string, \m
a word boundary followed by word characters, and ++
a possessive quantifier.
Upvotes: 2