Artur Sapek
Artur Sapek

Reputation: 2497

Python: detect strings that share words

I have a list of single and multi-word phrases:

terms = ['Electronic rock', 'Alternative rock', 'Indie pop']

I want to detect that terms[0] and terms[1] share the word rock. Is there a Pythonic way to do this, instead of using a ton of for-loops, temporary lists, and split(' ')?

Basically, I'm trying to detect a half-equality of phrases.

Upvotes: 2

Views: 188

Answers (4)

John Machin
John Machin

Reputation: 82934

Some variations on the answer of @MarkByers:

>>> from collections import defaultdict
>>>
>>> terms = [
...     'Electronic rock', 'Alternative rock', 'Indie pop',
...     'baa baa black sheep',
...     'Blackpool rock', # definition of "equality"?
...     'Rock of ages',
...     ]
>>>
>>> def process1():
...     d = defaultdict(list)
...     for term in terms:
...         for word in term.split():
...             d[word].append(term)
...     for k,v in d.iteritems():
...         if len(v) > 1:
...             print k,v
...
>>> def process2():
...     d = defaultdict(set)
...     for term in terms:
...         for word in term.split():
...             d[word.lower()].add(term)
...     for k,v in d.iteritems():
...         if len(v) > 1:
...             print k, sorted(list(v))
...
>>> process1()
rock ['Electronic rock', 'Alternative rock', 'Blackpool rock']
baa ['baa baa black sheep', 'baa baa black sheep']
>>> process2()
rock ['Alternative rock', 'Blackpool rock', 'Electronic rock', 'Rock of ages']
>>>

Upvotes: 1

Daniel YC Lin
Daniel YC Lin

Reputation: 16002

visit How to find list intersection? I think the answer could think from this. In your question, we don't know what's the result you want to present. I think you'd better list the result which you want to get.

Here I list the result which can give you some hint. (Well, without split, I don't think that will be clear to understand).

a=terms[0].split()
b=terms[1].split()
list(set(a) & set(b))

Upvotes: 1

jedwards
jedwards

Reputation: 30210

This is a terribly inefficient solution for these simple list elements but for longer strings you could use itertools' combinations to generate a set of 2-entry lists and then difflib to compare the strings. If you're just dealing with two or three word phrases, this solution is not for you.

Upvotes: 1

Mark Byers
Mark Byers

Reputation: 838226

You can use a dictonary to remember which words appear in which terms:

from collections import defaultdict

terms = ['Electronic rock', 'Alternative rock', 'Indie pop']
d = defaultdict(list)
for term in terms:
    for word in term.split():
        d[word].append(term)

for k,v in d.iteritems():
    if len(v) > 1:
        print k,v

Output:

rock ['Electronic rock', 'Alternative rock']

See it working online: ideone

Upvotes: 6

Related Questions