Prem Minister
Prem Minister

Reputation: 407

Python find the closest matching sentence

I am trying to get a list of tracks (songs) from an album and for a given track I would like to get all the ones that matched similarly. I have mentioned the example below, any ideas on how to proceed with this in python? Seems like difflib.get_close_matches just works for single words and not a sentence.

Eample: (to find anything that contained the string 'Around the world'

tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']

Output:

 Around The World (La La La La La) (Radio Version)
 Around The World (La La La La La) (Alternative Radio Version)
 Around The World (La La La La La) (Acoustic Mix)
 Around The World (La La La La La) (Rüegsegger#Wittwer Club Mix)

Upvotes: 4

Views: 7865

Answers (4)

Hemant
Hemant

Reputation: 1343

you can do like this.

temp = "Around The World (La La La La La)"

for string in fh.readlines():
    if temp in string:
       print temp

this will print if it'll match your temp from whatever file you are reading.

Or you can use regex for doing the matching.

Upvotes: -2

Abhijit
Abhijit

Reputation: 63717

You can leverege the get_matching_blocks of SequenceMatcher for this purpose

>>> from pprint import PrettyPrinter
>>> from difflib import SequenceMatcher
>>> pp = PrettyPrinter(indent = 4)
>>> pp.pprint(tracks)
[   'World In Motion',
    'With You',
    'Why Oh Why',
    'Thinking Of You',
    'My Heart Beats Like A Drum (Dam Dam Dam)',
    'Mistake No. 2',
    'Love Is Blind',
    'Lonesome Suite',
    'Let Me Come & Let Me Go',
    'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
    'Around The World (La La La La La) (Radio Version)',
    'Around The World (La La La La La) (Alternative Radio Version)',
    'Around The World (La La La La La) (Acoustic Mix)']
>>> seq = ((e, SequenceMatcher(None, 'Around the world', e).get_matching_blocks()[0]) for e in tracks)
>>> seq = [k for k, _ in sorted(seq, key = lambda e:e[-1].size, reverse = True)]
>>> pp.pprint(seq)
[   'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
    'Around The World (La La La La La) (Radio Version)',
    'Around The World (La La La La La) (Alternative Radio Version)',
    'Around The World (La La La La La) (Acoustic Mix)',
    'World In Motion',
    'With You',
    'Thinking Of You',
    'Why Oh Why',
    'My Heart Beats Like A Drum (Dam Dam Dam)',
    'Mistake No. 2',
    'Love Is Blind',
    'Lonesome Suite',
    'Let Me Come & Let Me Go']
>>> 

Upvotes: 1

unutbu
unutbu

Reputation: 879341

difflib.get_close_matches can work with strings (other than single words). In this case, you need to lower the cutoff (the default is 0.6), and raise n, the maximum number of matches:

In [19]: import difflib

In [20]: tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']

In [21]: difflib.get_close_matches('Around the world', tracks, n = 4,cutoff = 0.3)
Out[21]: 
['Around The World (La La La La La) (Acoustic Mix)',
 'Around The World (La La La La La) (Radio Version)',
 'Around The World (La La La La La) (Alternative Radio Version)',
 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)']

Upvotes: 8

Volatility
Volatility

Reputation: 32300

filter(lambda x: 'Around The World' in x, tracks)

This gives you a list of the songs that have 'Around The World' in the name. If you're using Python 3, cast it to a list (list(filter(...))) because it returns a filter object.

If there might be typos then I can't help you out.

Upvotes: 2

Related Questions