Retrieve Dataframe value(s) if the percentage of a string existing in another Dataframe row is = 75%

Question

Well, this is burning my brain and I decided to ask it here.
I have two existing Dataframes, the first one contains a small description of a physical object, and the other Dataframe usually has a detailed description of that same object.

The problem is that this description don't always have the same amount of words so it may exist different descriptions for the same object.

Since the dataframes don't have any other columns that make possible to match the correct information, my idea is to match these descriptions by a percentage of words contained in the second dataframe.

Dataframe 1:

Item
Livro didatico - Ciência da Computação, Física, Sistemas de Informação, Química e Matemática
Livro didatico - Eng. Civil, Eng. Elétrica, Eng. Mecânica, Eng. de Produção e Sistemas, Tecnologia Mecânica – Produção Industrial de Móveis, Tecnologia de Sistemas de Informação, Eng. do Petróleo, Eng. de Pesca, Eng. Sanitária e Eng. de Software.
Livro didatico - Ciências da Saúde Enfermagem, Fisioterapia, Educação Física

Dataframe 2 (Detailed object description):

Item
Livro didatico - pedagogico Diversos ( para aplicacao direta) Livros nacionais na área de Ciências Exatas e da Terra Ciência da Computação, Física, Sistemas de Informação, Química e Matemática
Livro didatico - pedagogico Diversos ( para aplicacao direta) Livros nacionais na área de Engenharias Eng. Civil, Eng. Elétrica, Eng. Mecânica, Eng. de Produção e Sistemas, Tecnologia Mecânica – Produção Industrial de Móveis, Tecnologia de Sistemas de Informação, Eng. do Petróleo, Eng. de Pesca, Eng. Sanitária e Eng. de Software.
Livro didatico - pedagogico Diversos ( para aplicacao direta) Livros nacionais na área de Ciências da Saúde Enfermagem, Fisioterapia, Educação Física

I already managed to remove any special characters and letter accents so it can be easier to search.

This is the method i used to calculate the percentage between two different strings, but i wanted to do this between these two dataframes. Is it possible in a pythonic way, or i will have to iterate in every row to match the desired string?

@staticmethod
def valida_descricao(string_contida: str, string_completa: str) -> int:
    porcentagem_str_contida = 100 / len(string_contida.split())
    soma_porcentagem = 0

    for token in string_contida.split(' '):
        if token in string_completa:
            soma_porcentagem += porcentagem_str_contida

    return soma_porcentagem

Retrieve Dataframe value(s) if the percentage of a string existing in another Dataframe row is >= 75%

Answers (1)

Related Questions

Retrieve Dataframe value(s) if the percentage of a string existing in another Dataframe row is &gt;= 75%

Answers (1)

Related Questions

Retrieve Dataframe value(s) if the percentage of a string existing in another Dataframe row is >= 75%