Count strings in Series Python

Question

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'X': ['Ciao, I would like to count the number of occurrences in this text considering negations that can change the meaning of the sentence',
                    "Hello, not number of negations, in this case we need to take care of the negation.",
                    "Hello world, don't number is another case in which where we need to consider negations."]})

I would like to count how many times a string appears in those senteces. So I simply do:

d = pd.DataFrame(['need'], columns = ['D'])
df['X'].str.count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))

0    1
1    2
2    2
Name: X, dtype: int64

However, in the application I am doing, I need to loop over each element of df which means:

res=[]
for i in range(len(df)):
    f = df['X'][i].count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))
    res.append(f)

[0,0,0]

I get two different results. The first one is obviously correct.

How can I fix it?

Thanks!

Corralien · Accepted Answer

Use iterrows:

import re

words = ['need', 'number']

res = {}
for idx, row in df.iterrows():
    count = len(re.findall('|'.join(words), row['X']))
    res[idx] = count
df['count'] = pd.Series(res)

Output:

>>> df
                                                   X  count
0  Ciao, I would like to count the number of occu...      1
1  Hello, not number of negations, in this case w...      2
2  Hello world, don't number is another case in w...      2

Count strings in Series Python

Answers (2)

Related Questions