Reputation: 2825
I have a data set like this:
index sentence
1 bobby went to the gym
1 sally the bad
1 days are good
2 sunny side up
2 the weird
I want to count how many times 'the' appears in the columns 'sentence' by index:
index count_the
1 2
2 1
how would I do this in pandas?
Upvotes: 0
Views: 50
Reputation: 2472
Also you can use groupby()+ apply():
df.groupby('index').apply(lambda x: x['sentence'].str.contains(r'.*the').sum()).reset_index(name = 'count_the')
or groupby()+ apply():
df.groupby('index').agg({'sentence': lambda x: x.str.contains(r'.*the').sum()}).reset_index(name = 'count_the')
Upvotes: 0
Reputation: 323226
one way from findall
, notice I treat the index columns as index here
df.sentence.str.findall(r'\bthe\b').str.len().sum(level=0)
Out[363]:
index
1 2
2 1
Name: sentence, dtype: int64
Upvotes: 1
Reputation: 1317
df = pd.DataFrame({'index' :[1,1,1,2,2],'sentence':['bobby went to the gym','sally the bad','days are good','sunny side up','the weird']})
df['counts'] = df['sentence'].str.count('the')
print(df.groupby('index')['counts'].sum())
Upvotes: 0
Reputation: 42916
First groupby.Series.apply
, then use series.str.count
:
df = df.groupby('index').sentence.apply(' '.join).reset_index()
print(df)
index sentence
0 1 bobby went to the gym sally the bad days are good
1 2 sunny side up the weird
df['count_the'] = df.sentence.str.count('the')
print(df.drop(['sentence'],axis=1))
index count_the
0 1 2
1 2 1
Upvotes: 1