everestial
everestial

Reputation: 7255

How to count the number of matching items in pandas dataframe column from another column?

In this small dataframe:

d1 = pd.read_csv('to_count.mcve.txt', sep='\t')

pos   M1          M2      F1
23    A,B,A,C,D   A,C,B   A
24    A,B,B,C,B   A,B,A   B
28    C,B,C,D,E   B,C     E

I want to count how many of the values in F1 are in M1 and M2. As, a learning process I have broken it down into pieces.

Expected final output:

pos   M1  M2  F1
23    2   1   1
24    3   1   1
28    1   0   1

What I tried:

d1 = d1.set_index(['pos'], append=True) # to set the pos value aside for simplicity

I can either use any of the methods (which are not totally working though) but want to learn the process.

Then: Do a direct count in one line

d1_count = d1.apply(lambda x: d1.count(d1['F1']))

This is throwing an error message. I know I am doing it wrong. So, what is wrong with my approach? and how could I have done a better job?

Any suggestion with explanation (for any or all methods) is appreciated.

Upvotes: 1

Views: 2159

Answers (1)

akuiper
akuiper

Reputation: 215047

You can use apply method to loop through data frame rows and str.count method to count the number of appearances:

df[['M1', 'M2']] = df.apply(lambda x: x.loc['M1':'M2'].str.count(x.F1), 1)
df

#  pos  M1  M2  F1
#0  23  2   1   A
#1  24  3   1   B
#2  28  1   0   E

If you want to count all columns except for pos:

df.set_index('pos').apply(lambda x: x.str.count(x.F1), 1).reset_index()

#  pos  M1  M2  F1
#0  23  2   1   1
#1  24  3   1   1
#2  28  1   0   1

Upvotes: 2

Related Questions