Mario L
Mario L

Reputation: 45

Python 3: very difficult (at least for me) university homework about anagrams

I have this homework that blocked me. I actually found a solution that works fine but only when the data to be analyzed is not very big.

I have a list of strings where every words is an anagrams of some others that we call "generator". For every "generator" the strings that are related can be its anagrams or anagrams + 1 letter. I need to find the "generator" that have the max number of related words.

The example explains better:

The strings are:

the "generator" with max number of words is 'arto' (or it's anagram) and the related words are:

the words excluded are "tratto" and "trotta" because are too long.

it could be that the "generator" is not included into the final result, example:

The "generator" is 'bee'

This is my solution:

def open_r(ftesto_in):
    lista = []
    with open(ftesto_in, encoding='utf8') as f:
        for word in f:
            lista.append(word.strip())
    return lista

def es(ftesto_in):
    lista=open_r(ftesto_in)
    d = {}
    for i in range(len(lista)):
        a = lista[i]
        for j in range(i+1, len(lista)):
            b = lista[j]
            if len(a) == len(b) or len(a) == len(b)+1 or len(a) == len(b)-1:
                    gen = tuple(generator(a,b))
                    if len(gen) == len(a) or len(gen) == len(a)-1:
                        dic_upd(d, gen, a)
                    if len(gen) == len(b) or len(gen) == len(b)-1:
                        dic_upd(d, gen, b)                           
    d = {k: list(set(v)) for k, v in d.items()}
    result = maximum(d)
    return result

def dic_upd(d,k,v):
    if k not in d:
        d[k] = [v]
    else:
        d[k].append(v)

def maximum(d):
    result = 0
    lista = []
    for k,v in d.items():
        count = len(v)
        if count > result:
            result = count
            lista.clear()
            for i in v:
               lista.append(i)
    return lista


def generator(a,b):
    res = []
    a = list(a)
    b = list(b)
    for i in a:
        if i in b:
            res.append(i)
            b.remove(i)
    return sorted(res)

when the number of strings are more than 2-3 hundred or the strings are very long, my program takes a lot of seconds or minutes. Do you have some better idea how find these words? Thanks!

Upvotes: 2

Views: 105

Answers (1)

orlp
orlp

Reputation: 117701

Hint: count the number of occurrences of each character in your input, per word. Can you efficiently solve the problem using only that? E.g.:

        a  o  p  r  t
trota   1  1  0  1  2
tratto  1  1  0  1  3
arto    1  1  0  1  1
parto   1  1  1  1  1
taro    1  1  0  1  1
trotta  1  1  0  1  3
rotta   1  1  0  1  2
orta    1  1  0  1  1
porta   1  1  1  1  1

Upvotes: 1

Related Questions