Fabio Lamanna
Fabio Lamanna

Reputation: 21552

pandas get frequency of values on multiple columns

I have this kind of dataframe df:

df = pd.DataFrame({'c':[1,1,2,2,3,3],'L0':['a','a','b','c','d','e'],'L1':['a','b','c','e','f','e']})

I'm now trying to get the frequency of each values in columns L0 and L1 for each value of c. The expected output would be:

c    a    b    c    d    e    f
1    3    1    0    0    0    0
2    0    1    2    0    1    0
3    0    0    0    1    2    1

I thought I could use something like:

df.pivot_table(index='c', columns=np.unique(['L0','L1']), aggfunc=f)

but I can't figured out how to describe f that should be a function able to get the value_counts() over multiple columns.

Upvotes: 2

Views: 4268

Answers (3)

Nickil Maveli
Nickil Maveli

Reputation: 29711

You could use the crosstab method which by default computes the frequency table of the factors as shown:

>>> df_1 = pd.crosstab(df.c, df.L0)
>>> df_2 = pd.crosstab(df.c, df.L1)

>>> df_1.add(df_2, fill_value=0)

   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1

Upvotes: 3

IanS
IanS

Reputation: 16241

Edit: This is slightly simpler:

In[48]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).unstack().value_counts().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)

See below for explanations.


The function you're looking for is groupby, not pivot. You can then use value_counts on each dataframe, grouped by c values, individually.

This is close to what you were looking for:

In[39] : df.groupby('c').apply(lambda df1: 
             df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose()
         )
Out[39]: 
       a   b   c   d   e   f
c                           
1 L0   2 NaN NaN NaN NaN NaN
  L1   1   1 NaN NaN NaN NaN
2 L0 NaN   1   1 NaN NaN NaN
  L1 NaN NaN   1 NaN   1 NaN
3 L0 NaN NaN NaN   1   1 NaN
  L1 NaN NaN NaN NaN   1   1

In order to sum the values, the end result is surprisingly complicated:

In[46]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose().sum().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)
Out[46]: 
   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1

Upvotes: 1

knagaev
knagaev

Reputation: 2957

Didn't try to describe f but want to solve your problem the other way

In [356]: df.set_index('c').stack().reset_index().groupby(['c', 0]).count().unstack().fillna(0)
Out[356]: 
  level_1                         
0       a    b    c    d    e    f
c                                 
1     3.0  1.0  0.0  0.0  0.0  0.0
2     0.0  1.0  2.0  0.0  1.0  0.0
3     0.0  0.0  0.0  1.0  2.0  1.0

Upvotes: 1

Related Questions