Reputation: 407
I am trying to get a list of tracks (songs) from an album and for a given track I would like to get all the ones that matched similarly. I have mentioned the example below, any ideas on how to proceed with this in python? Seems like difflib.get_close_matches just works for single words and not a sentence.
Eample: (to find anything that contained the string 'Around the world'
tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']
Output:
Around The World (La La La La La) (Radio Version)
Around The World (La La La La La) (Alternative Radio Version)
Around The World (La La La La La) (Acoustic Mix)
Around The World (La La La La La) (Rüegsegger#Wittwer Club Mix)
Upvotes: 4
Views: 7865
Reputation: 1343
you can do like this.
temp = "Around The World (La La La La La)"
for string in fh.readlines():
if temp in string:
print temp
this will print if it'll match your temp from whatever file you are reading.
Or you can use regex for doing the matching.
Upvotes: -2
Reputation: 63717
You can leverege the get_matching_blocks of SequenceMatcher for this purpose
>>> from pprint import PrettyPrinter
>>> from difflib import SequenceMatcher
>>> pp = PrettyPrinter(indent = 4)
>>> pp.pprint(tracks)
[ 'World In Motion',
'With You',
'Why Oh Why',
'Thinking Of You',
'My Heart Beats Like A Drum (Dam Dam Dam)',
'Mistake No. 2',
'Love Is Blind',
'Lonesome Suite',
'Let Me Come & Let Me Go',
'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
'Around The World (La La La La La) (Radio Version)',
'Around The World (La La La La La) (Alternative Radio Version)',
'Around The World (La La La La La) (Acoustic Mix)']
>>> seq = ((e, SequenceMatcher(None, 'Around the world', e).get_matching_blocks()[0]) for e in tracks)
>>> seq = [k for k, _ in sorted(seq, key = lambda e:e[-1].size, reverse = True)]
>>> pp.pprint(seq)
[ 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
'Around The World (La La La La La) (Radio Version)',
'Around The World (La La La La La) (Alternative Radio Version)',
'Around The World (La La La La La) (Acoustic Mix)',
'World In Motion',
'With You',
'Thinking Of You',
'Why Oh Why',
'My Heart Beats Like A Drum (Dam Dam Dam)',
'Mistake No. 2',
'Love Is Blind',
'Lonesome Suite',
'Let Me Come & Let Me Go']
>>>
Upvotes: 1
Reputation: 879341
difflib.get_close_matches
can work with strings (other than single words). In this case, you need to lower the cutoff (the default is 0.6), and raise n
, the maximum number of matches:
In [19]: import difflib
In [20]: tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']
In [21]: difflib.get_close_matches('Around the world', tracks, n = 4,cutoff = 0.3)
Out[21]:
['Around The World (La La La La La) (Acoustic Mix)',
'Around The World (La La La La La) (Radio Version)',
'Around The World (La La La La La) (Alternative Radio Version)',
'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)']
Upvotes: 8
Reputation: 32300
filter(lambda x: 'Around The World' in x, tracks)
This gives you a list of the songs that have 'Around The World'
in the name. If you're using Python 3, cast it to a list (list(filter(...))
) because it returns a filter
object.
If there might be typos then I can't help you out.
Upvotes: 2