Reputation: 27
I am trying to find a simple way of converting a pandas dataframe into another dataframe with frequency of each feature. I'll provide an example of what I'm trying to do below
Current dataframe example (feature labels are just index values here):
0 1 2 3 4 ... n
0 2 3 1 4 2 ~
1 4 3 4 3 2 ~
2 2 3 2 3 2 ~
3 1 3 0 3 2 ~
...
m ~ ~ ~ ~ ~ ~
Dataframe I would like to convert this to:
0 1 2 3 4 ... n
0 0 1 2 1 1 ~
1 0 0 1 2 2 ~
2 0 0 3 2 0 ~
3 1 1 1 2 0 ~
...
m ~ ~ ~ ~ ~ ~
As you can see, the column label corresponds to the possible numbers within the dataframe and each frequency of that number per row is put into that specific feature for the row in question. Is there a simple way to do this with python? I have a large dataframe that I am trying to transform into a dataframe of frequencies for feature selection.
If any more information is needed I will update my post.
Upvotes: 1
Views: 93
Reputation: 294278
The value of this is speed. But OBVIOUSLY more complicated.
n, k = df.shape
i = df.index.to_numpy().repeat(k)
j = np.ravel(df)
m = j.max() + 1
a = np.zeros((n, m), int)
np.add.at(a, (i, j), 1)
pd.DataFrame(a, df.index, range(m))
0 1 2 3 4
0 0 1 2 1 1
1 0 0 1 2 2
2 0 0 3 2 0
3 1 1 1 2 0
This produces an index i
that will correspond to the values in df
that I assign to j
. I'll use these indices to add one at positions of an array a
designated by the indices in i
and j
Upvotes: 1
Reputation: 30920
Use pd.value_counts
with apply
:
df.apply(pd.value_counts, axis=1).fillna(0)
0 1 2 3 4
0 0.0 1.0 2.0 1.0 1.0
1 0.0 0.0 1.0 2.0 2.0
2 0.0 0.0 3.0 2.0 0.0
3 1.0 1.0 1.0 2.0 0.0
Alternative DataFrame.melt
with pd.crosstab
df2 = df.T.melt()
pd.crosstab(df2['variable'], df2['value'])
Upvotes: 3