Rollo99
Rollo99

Reputation: 1613

Count strings in Series Python

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'X': ['Ciao, I would like to count the number of occurrences in this text considering negations that can change the meaning of the sentence',
                    "Hello, not number of negations, in this case we need to take care of the negation.",
                    "Hello world, don't number is another case in which where we need to consider negations."]})

I would like to count how many times a string appears in those senteces. So I simply do:

d = pd.DataFrame(['need'], columns = ['D'])
df['X'].str.count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))

0    1
1    2
2    2
Name: X, dtype: int64

However, in the application I am doing, I need to loop over each element of df which means:

res=[]
for i in range(len(df)):
    f = df['X'][i].count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))
    res.append(f)

[0,0,0]

I get two different results. The first one is obviously correct.

How can I fix it?

Thanks!

Upvotes: 1

Views: 69

Answers (2)

Sbunzini
Sbunzini

Reputation: 542

I think the fastest way is to use another function to count number of occurrences in a regex. You can try something like that:

import re
res=[]
for i in range(len(df)):
    pattern = '|'.join(d.append({'D': 'number'}, ignore_index = True).D)
    text = df['X'][I]
    count = len(re.findall(pattern, text))
    res.append(count)

Upvotes: 1

Corralien
Corralien

Reputation: 120409

Use iterrows:

import re

words = ['need', 'number']

res = {}
for idx, row in df.iterrows():
    count = len(re.findall('|'.join(words), row['X']))
    res[idx] = count
df['count'] = pd.Series(res)

Output:

>>> df
                                                   X  count
0  Ciao, I would like to count the number of occu...      1
1  Hello, not number of negations, in this case w...      2
2  Hello world, don't number is another case in w...      2

Upvotes: 1

Related Questions