Reputation: 45
I have this homework that blocked me. I actually found a solution that works fine but only when the data to be analyzed is not very big.
I have a list of strings where every words is an anagrams of some others that we call "generator". For every "generator" the strings that are related can be its anagrams or anagrams + 1 letter. I need to find the "generator" that have the max number of related words.
The example explains better:
The strings are:
the "generator" with max number of words is 'arto' (or it's anagram) and the related words are:
the words excluded are "tratto" and "trotta" because are too long.
it could be that the "generator" is not included into the final result, example:
The "generator" is 'bee'
This is my solution:
def open_r(ftesto_in):
lista = []
with open(ftesto_in, encoding='utf8') as f:
for word in f:
lista.append(word.strip())
return lista
def es(ftesto_in):
lista=open_r(ftesto_in)
d = {}
for i in range(len(lista)):
a = lista[i]
for j in range(i+1, len(lista)):
b = lista[j]
if len(a) == len(b) or len(a) == len(b)+1 or len(a) == len(b)-1:
gen = tuple(generator(a,b))
if len(gen) == len(a) or len(gen) == len(a)-1:
dic_upd(d, gen, a)
if len(gen) == len(b) or len(gen) == len(b)-1:
dic_upd(d, gen, b)
d = {k: list(set(v)) for k, v in d.items()}
result = maximum(d)
return result
def dic_upd(d,k,v):
if k not in d:
d[k] = [v]
else:
d[k].append(v)
def maximum(d):
result = 0
lista = []
for k,v in d.items():
count = len(v)
if count > result:
result = count
lista.clear()
for i in v:
lista.append(i)
return lista
def generator(a,b):
res = []
a = list(a)
b = list(b)
for i in a:
if i in b:
res.append(i)
b.remove(i)
return sorted(res)
when the number of strings are more than 2-3 hundred or the strings are very long, my program takes a lot of seconds or minutes. Do you have some better idea how find these words? Thanks!
Upvotes: 2
Views: 105
Reputation: 117701
Hint: count the number of occurrences of each character in your input, per word. Can you efficiently solve the problem using only that? E.g.:
a o p r t
trota 1 1 0 1 2
tratto 1 1 0 1 3
arto 1 1 0 1 1
parto 1 1 1 1 1
taro 1 1 0 1 1
trotta 1 1 0 1 3
rotta 1 1 0 1 2
orta 1 1 0 1 1
porta 1 1 1 1 1
Upvotes: 1