Reputation: 6159
my_list=["one","is"]
df
Out[6]:
Name Story
0 Kumar Kumar is one of the great player in his team
1 Ravi Ravi is a good poet
2 Ram Ram drives well
if anyone of the items in my_list is present in the "Story" column I need to get the no of occurrence for all the items.
my_desired_output
new_df
word count
one 1
is 2
I achieved extracting the row which are having anyone of the items in my_list using
mask=df1["Story"].str.contains('|'.join(my_list),na=False) but now I am trying get the counts of each word in my_list
Upvotes: 1
Views: 92
Reputation: 862761
You can use str.split
with stack
for Series
of words first:
a = df['Story'].str.split(expand=True).stack()
print (a)
0 0 Kumar
1 is
2 one
3 of
4 the
5 great
6 player
7 in
8 his
9 team
1 0 Ravi
1 is
2 a
3 good
4 poet
2 0 Ram
1 drives
2 well
dtype: object
Then filter by boolean indexing
with isin
, get value_counts
and for DataFrame add rename_axis
and reset_index
:
df = a[a.isin(my_list)].value_counts().rename_axis('word').reset_index(name='count')
print (df)
word count
0 is 2
1 one 1
Another solution with creating list of all words by str.split
, then fllaten by from_iterable
, use Counter
and last create DataFrame
by constructor:
from collections import Counter
from itertools import chain
my_list=["one","is"]
a = list(chain.from_iterable(df['Story'].str.split().values.tolist()))
print (a)
['Kumar', 'is', 'one', 'of', 'the', 'great', 'player',
'in', 'his', 'team', 'Ravi', 'is', 'a', 'good', 'poet', 'Ram', 'drives', 'well']
b = Counter([x for x in a if x in my_list])
print (b)
Counter({'is': 2, 'one': 1})
df = pd.DataFrame({'word':list(b.keys()),'count':list(b.values())}, columns=['word','count'])
print (df)
word count
0 one 1
1 is 2
Upvotes: 1