Nilani Algiriyage
Nilani Algiriyage

Reputation: 35646

Pandas Dataframe count availability of string in a list

Lets say I have a Pandas DataFrame like following.

In [31]: frame = pd.DataFrame({'a' : ['A/B/C/D', 'A/B/C', 'A/E','D/E/F']})

In [32]: frame
Out[32]: 
         a
0  A/B/C/D
1    A/B/C
2      A/E
3    D/E/F

And I have string list like following.

In [33]: mylist =['A/B/C/D', 'A/B/C', 'A/B']

Here two of the patterns in mylist is available in my DataFrame. So I need to get output saying 2/3*100 = 67%

In [34]: pattern = '|'.join(mylist)
In [35]: frame.a.str.contains(pattern).count()

This is not working. Any help to get my expected output.

Upvotes: 0

Views: 59

Answers (1)

jrjc
jrjc

Reputation: 21873

You can do this way :

In [1]: len(frame[frame.a.isin(mylist)])/float(len(mylist)) * 100
Out[1]: 66.66666666666666

Or with you method :

In [2]: pattern = '|'.join(mylist)
In [2]: count = frame.a.str.contains(pattern).sum() # will add up True values
In [3]: count/float(len(mylist))*100
Out[3]: 66.666666666666

Upvotes: 1

Related Questions