Reputation: 21552
I have this kind of dataframe df
:
df = pd.DataFrame({'c':[1,1,2,2,3,3],'L0':['a','a','b','c','d','e'],'L1':['a','b','c','e','f','e']})
I'm now trying to get the frequency of each values in columns L0
and L1
for each value of c
. The expected output would be:
c a b c d e f
1 3 1 0 0 0 0
2 0 1 2 0 1 0
3 0 0 0 1 2 1
I thought I could use something like:
df.pivot_table(index='c', columns=np.unique(['L0','L1']), aggfunc=f)
but I can't figured out how to describe f
that should be a function able to get the value_counts()
over multiple columns.
Upvotes: 2
Views: 4268
Reputation: 29711
You could use the crosstab
method which by default computes the frequency table of the factors as shown:
>>> df_1 = pd.crosstab(df.c, df.L0)
>>> df_2 = pd.crosstab(df.c, df.L1)
>>> df_1.add(df_2, fill_value=0)
a b c d e f
c
1 3 1 0 0 0 0
2 0 1 2 0 1 0
3 0 0 0 1 2 1
Upvotes: 3
Reputation: 16241
Edit: This is slightly simpler:
In[48]: df.groupby('c').apply(lambda df1:
df1.drop('c', axis=1).unstack().value_counts().to_frame().transpose()
).reset_index(level=1, drop=True).fillna(0)
See below for explanations.
The function you're looking for is groupby
, not pivot
. You can then use value_counts
on each dataframe, grouped by c
values, individually.
This is close to what you were looking for:
In[39] : df.groupby('c').apply(lambda df1:
df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose()
)
Out[39]:
a b c d e f
c
1 L0 2 NaN NaN NaN NaN NaN
L1 1 1 NaN NaN NaN NaN
2 L0 NaN 1 1 NaN NaN NaN
L1 NaN NaN 1 NaN 1 NaN
3 L0 NaN NaN NaN 1 1 NaN
L1 NaN NaN NaN NaN 1 1
In order to sum the values, the end result is surprisingly complicated:
In[46]: df.groupby('c').apply(lambda df1:
df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose().sum().to_frame().transpose()
).reset_index(level=1, drop=True).fillna(0)
Out[46]:
a b c d e f
c
1 3 1 0 0 0 0
2 0 1 2 0 1 0
3 0 0 0 1 2 1
Upvotes: 1
Reputation: 2957
Didn't try to describe f but want to solve your problem the other way
In [356]: df.set_index('c').stack().reset_index().groupby(['c', 0]).count().unstack().fillna(0)
Out[356]:
level_1
0 a b c d e f
c
1 3.0 1.0 0.0 0.0 0.0 0.0
2 0.0 1.0 2.0 0.0 1.0 0.0
3 0.0 0.0 0.0 1.0 2.0 1.0
Upvotes: 1