Reputation: 2497
I have a list of single and multi-word phrases:
terms = ['Electronic rock', 'Alternative rock', 'Indie pop']
I want to detect that terms[0]
and terms[1]
share the word rock
. Is there a Pythonic way to do this, instead of using a ton of for-loops, temporary lists, and split(' ')
?
Basically, I'm trying to detect a half-equality of phrases.
Upvotes: 2
Views: 188
Reputation: 82934
Some variations on the answer of @MarkByers:
>>> from collections import defaultdict
>>>
>>> terms = [
... 'Electronic rock', 'Alternative rock', 'Indie pop',
... 'baa baa black sheep',
... 'Blackpool rock', # definition of "equality"?
... 'Rock of ages',
... ]
>>>
>>> def process1():
... d = defaultdict(list)
... for term in terms:
... for word in term.split():
... d[word].append(term)
... for k,v in d.iteritems():
... if len(v) > 1:
... print k,v
...
>>> def process2():
... d = defaultdict(set)
... for term in terms:
... for word in term.split():
... d[word.lower()].add(term)
... for k,v in d.iteritems():
... if len(v) > 1:
... print k, sorted(list(v))
...
>>> process1()
rock ['Electronic rock', 'Alternative rock', 'Blackpool rock']
baa ['baa baa black sheep', 'baa baa black sheep']
>>> process2()
rock ['Alternative rock', 'Blackpool rock', 'Electronic rock', 'Rock of ages']
>>>
Upvotes: 1
Reputation: 16002
visit How to find list intersection? I think the answer could think from this. In your question, we don't know what's the result you want to present. I think you'd better list the result which you want to get.
Here I list the result which can give you some hint. (Well, without split, I don't think that will be clear to understand).
a=terms[0].split()
b=terms[1].split()
list(set(a) & set(b))
Upvotes: 1
Reputation: 30210
This is a terribly inefficient solution for these simple list elements but for longer strings you could use itertools' combinations to generate a set of 2-entry lists and then difflib to compare the strings. If you're just dealing with two or three word phrases, this solution is not for you.
Upvotes: 1
Reputation: 838226
You can use a dictonary to remember which words appear in which terms:
from collections import defaultdict
terms = ['Electronic rock', 'Alternative rock', 'Indie pop']
d = defaultdict(list)
for term in terms:
for word in term.split():
d[word].append(term)
for k,v in d.iteritems():
if len(v) > 1:
print k,v
Output:
rock ['Electronic rock', 'Alternative rock']
See it working online: ideone
Upvotes: 6