Python 3: very difficult (at least for me) university homework about anagrams

Question

I have this homework that blocked me. I actually found a solution that works fine but only when the data to be analyzed is not very big.

I have a list of strings where every words is an anagrams of some others that we call "generator". For every "generator" the strings that are related can be its anagrams or anagrams + 1 letter. I need to find the "generator" that have the max number of related words.

The example explains better:

The strings are:

trota
tratto
arto
parto
taro
trotta
rotta
orta
porta

the "generator" with max number of words is 'arto' (or it's anagram) and the related words are:

arto
orta (anagrams of 'arto')
parto (anagrams of 'arto' + letter p)
porta (anagrams of 'arto' + letter p)
rotta (anagrams of 'arto' + letter t)
taro (anagrams of 'arto')
trota (anagrams of 'arto' + letter t)

the words excluded are "tratto" and "trotta" because are too long.

it could be that the "generator" is not included into the final result, example:

beel
baee
beae

The "generator" is 'bee'

This is my solution:

def open_r(ftesto_in):
    lista = []
    with open(ftesto_in, encoding='utf8') as f:
        for word in f:
            lista.append(word.strip())
    return lista

def es(ftesto_in):
    lista=open_r(ftesto_in)
    d = {}
    for i in range(len(lista)):
        a = lista[i]
        for j in range(i+1, len(lista)):
            b = lista[j]
            if len(a) == len(b) or len(a) == len(b)+1 or len(a) == len(b)-1:
                    gen = tuple(generator(a,b))
                    if len(gen) == len(a) or len(gen) == len(a)-1:
                        dic_upd(d, gen, a)
                    if len(gen) == len(b) or len(gen) == len(b)-1:
                        dic_upd(d, gen, b)                           
    d = {k: list(set(v)) for k, v in d.items()}
    result = maximum(d)
    return result

def dic_upd(d,k,v):
    if k not in d:
        d[k] = [v]
    else:
        d[k].append(v)

def maximum(d):
    result = 0
    lista = []
    for k,v in d.items():
        count = len(v)
        if count > result:
            result = count
            lista.clear()
            for i in v:
               lista.append(i)
    return lista


def generator(a,b):
    res = []
    a = list(a)
    b = list(b)
    for i in a:
        if i in b:
            res.append(i)
            b.remove(i)
    return sorted(res)

when the number of strings are more than 2-3 hundred or the strings are very long, my program takes a lot of seconds or minutes. Do you have some better idea how find these words? Thanks!

Python 3: very difficult (at least for me) university homework about anagrams

Answers (1)

Related Questions