Reputation: 2378
I have a pandas.DataFrame
that looks like this.
COL1 COL2 COL3
C1 None None
C1 C2 None
C1 C1 None
C1 C2 C3
For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this
COL1 COL2 COL3 C1 C2 C3
C1 None None 1 0 0
C1 C2 None 1 1 0
C1 C1 None 2 0 0
C1 C2 C3 1 1 1
So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply
approach that can achieve this in a compact fashion?
Upvotes: 17
Views: 23136
Reputation: 323396
Usually apply
+ serise
function to whole dataframe will slowing down the whole process , Additional Reading : Link
df.mask(df.eq('None')).stack().str.get_dummies().sum(level=0)
Out[165]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
Or you can do with Counter
from collections import Counter
pd.DataFrame([ Counter(x) for x in df.values]).drop('None',1)
Out[170]:
C1 C2 C3
0 1 NaN NaN
1 1 1.0 NaN
2 2 NaN NaN
3 1 1.0 1.0
Upvotes: 1
Reputation: 77027
Andy's answer is spot on.
I'm adding this answer, if C1,C2...Cn list is huge and we want to view only subset of them.
dff = df.copy()
dff['C1']=(df == 'C1').T.sum()
dff['C2']=(df == 'C2').T.sum()
dff['C3']=(df == 'C3').T.sum()
dff
COL1 COL2 COL3 C1 C2 C3
0 C1 None None 1 0 0
1 C1 C2 None 1 1 0
2 C1 C1 None 2 0 0
3 C1 C2 C3 1 1 1
Upvotes: 4
Reputation: 375925
You could apply value_counts
:
In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]:
C1 C2 C3 None
0 1 NaN NaN 2
1 1 1 NaN 1
2 2 NaN NaN 1
3 1 1 1 NaN
So you can fill the NaN and applend just the base values you want:
In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).
Upvotes: 22