Reputation: 123
I have two lists named 'query' and 'data', both of which contain strings. I need to count how many anagrams of each string in 'query' there are in 'data'.
For example for the following two lists:
query = ['no', 'result', 'oh', 'abc', 'temper']
data = ['no', 'on', 'bca', 'oh', 'cba', 'repmet', 'serult', 'pemter', 'tluser', 'tlures', 'pterem', 'temrep']
the output would be a dict with the anagram counts for each word:
{'no': 2, 'result': 3, 'oh': 1, 'abc': 2, 'temper': 4}
I have an initial brute force solution using nested loops but was wondering how I should go about optimizing this since it is pretty slow when the lists get larger.
dict1 = {}
data.sort()
data.sort(key=len, reverse=False)
for idx in range(len(query)):
dict1[query[idx]] = 0
x = sorted(query[idx])
for idx2 in range(len(data)):
if len(data[idx2]) > len(query[idx]):
break
if data[idx2] == query[idx]:
dict1[query[idx]] += 1
elif x == sorted(data[idx2]):
dict1[query[idx]] += 1
Upvotes: 0
Views: 264
Reputation: 51998
You could use a Counter object:
from collections import Counter
query = ['no', 'result', 'oh', 'abc', 'temper']
data = ['no', 'on', 'bca', 'oh', 'cba', 'repmet', 'serult', 'pemter', 'tluser', 'tlures', 'pterem', 'temrep']
counts = Counter(''.join(sorted(word)) for word in data)
anagram_counts = {k:counts[''.join(sorted(k))] for k in query}
print(anagram_counts) #prints {'no': 2, 'result': 3, 'oh': 1, 'abc': 2, 'temper': 4}
This has linear complexity whereas your nested loops approach has quadratic complexity. Even without using a Counter object, you can get linear complexity: one pass over data
to create a dictionary of counts and a subsequent pass over query
, using the dictionary constructed in the first loop to create the target dictionary.
Upvotes: 2