Count the (total) number of special words in large pandas df

Question

I have large df with texts:

target = [['cuantos festivales conciertos sobre todo persona perdido esta pandemia'],
['existe impresión estar entrando últimos tiempos pronto tarde mayoría vivimos sufriremos'],
['pandemia sigue hambre acecha humanidad faltaba mueren inundaciones bélgica alemania'],
['nombre maría ángeles todas mujeres sido asesinadas hecho serlo esta pandemia lugares de trabajo']]

and 4 sets of word like:

words1 = ['festivales', 'pandemia', 'lugares de trabajo', 'mueren', 'faltaba']
words2 = ['persona ', 'faltaba', 'entrando', 'sobre']

moreover, words from the set may contain spaces, like in 'lugares de trabajo'.

I need to count how many times the words from the list are present in each line in the sum (I don't need how many times one of the words appears) so result df looks like:

  word_set1 word_set_2
1     1          1
2     0          1
3     2          1
4     1          0

I tried this for count (then I planned to just summarize the results)

for terms in words1:
    df[str(terms)] = map(lambda x: x.count(str(terms)), target['tokenized'])

but got

TypeError: object of type 'map' has no len()

tlentali · Accepted Answer

We can use the str.count method to get the expected result :

df['word_set1'] = df['text'].str.count('|'.join(words1))
df['word_set2'] = df['text'].str.count('|'.join(words2))

Output :

    text                                                word_set1   word_set2
0   cuantos festivales conciertos sobre todo perso...   2           2
1   existe impresión estar entrando últimos tiempo...   0           1
2   pandemia sigue hambre acecha humanidad faltaba...   3           1
3   nombre maría ángeles todas mujeres sido asesin...   2           0

Count the (total) number of special words in large pandas df

Answers (1)

Related Questions