Need help getting the frequency of each number in a pandas dataframe

Question

I am trying to find a simple way of converting a pandas dataframe into another dataframe with frequency of each feature. I'll provide an example of what I'm trying to do below

Current dataframe example (feature labels are just index values here):

   0   1   2   3   4   ...   n
0  2   3   1   4   2         ~
1  4   3   4   3   2         ~
2  2   3   2   3   2         ~
3  1   3   0   3   2         ~
...
m  ~   ~   ~   ~   ~         ~

Dataframe I would like to convert this to:

   0   1   2   3   4   ...   n
0  0   1   2   1   1         ~
1  0   0   1   2   2         ~
2  0   0   3   2   0         ~
3  1   1   1   2   0         ~
...
m  ~   ~   ~   ~   ~         ~

As you can see, the column label corresponds to the possible numbers within the dataframe and each frequency of that number per row is put into that specific feature for the row in question. Is there a simple way to do this with python? I have a large dataframe that I am trying to transform into a dataframe of frequencies for feature selection.

If any more information is needed I will update my post.

ansev · Accepted Answer

Use pd.value_counts with apply:

df.apply(pd.value_counts, axis=1).fillna(0)

     0    1    2    3    4
0  0.0  1.0  2.0  1.0  1.0
1  0.0  0.0  1.0  2.0  2.0
2  0.0  0.0  3.0  2.0  0.0
3  1.0  1.0  1.0  2.0  0.0

Alternative DataFrame.melt with pd.crosstab

df2 = df.T.melt()
pd.crosstab(df2['variable'], df2['value'])

Need help getting the frequency of each number in a pandas dataframe

Answers (2)

Numpy

Related Questions