Reputation: 7255
In this small dataframe:
d1 = pd.read_csv('to_count.mcve.txt', sep='\t')
pos M1 M2 F1
23 A,B,A,C,D A,C,B A
24 A,B,B,C,B A,B,A B
28 C,B,C,D,E B,C E
I want to count how many of the values in F1 are in M1 and M2. As, a learning process I have broken it down into pieces.
Expected final output:
pos M1 M2 F1
23 2 1 1
24 3 1 1
28 1 0 1
What I tried:
d1 = d1.set_index(['pos'], append=True) # to set the pos value aside for simplicity
I can either use any of the methods (which are not totally working though) but want to learn the process.
Then: Do a direct count in one line
d1_count = d1.apply(lambda x: d1.count(d1['F1']))
This is throwing an error message. I know I am doing it wrong. So, what is wrong with my approach? and how could I have done a better job?
Any suggestion with explanation (for any or all methods) is appreciated.
Upvotes: 1
Views: 2159
Reputation: 215047
You can use apply
method to loop through data frame rows and str.count
method to count the number of appearances:
df[['M1', 'M2']] = df.apply(lambda x: x.loc['M1':'M2'].str.count(x.F1), 1)
df
# pos M1 M2 F1
#0 23 2 1 A
#1 24 3 1 B
#2 28 1 0 E
If you want to count all columns except for pos
:
df.set_index('pos').apply(lambda x: x.str.count(x.F1), 1).reset_index()
# pos M1 M2 F1
#0 23 2 1 1
#1 24 3 1 1
#2 28 1 0 1
Upvotes: 2