Reputation: 939
I have the dataframe -
df = pd.DataFrame({'colA':['a', 'a', 'a', 'b' ,'b'], 'colB':['a', 'b', 'a', 'c', 'b'], 'colC':['x', 'x', 'y', 'y', 'y']})
I would like to write a function to replace each value with it's frequency count in that column. For example colA will now be [3, 3, 3, 2, 2]
I have attempted to do this by creating a dictionary with the value and the frequency count, assign that dictionary to a variable freq
, then map the column values to freq
. I have written the following function
def LabelEncode_method1(col):
freq = col.value_counts().to_dict()
col = col.map(freq)
return col.head()```
When I run the following LabelEncode_method1(df.colA)
, I get the result 3, 3, 3, 2, 2
. However when I call the dataframe df
, the values for colA
are still 'a', 'a', 'a', 'b', 'b'
freq
, as opposed to calling the function 3 separate times for each column.Upvotes: 4
Views: 249
Reputation: 51165
You can use map
+ value_counts
(Which you have already found, you just need to assign the result back to your DataFrame).
df['colA'].map(df['colA'].value_counts())
0 3
1 3
2 3
3 2
4 2
Name: colA, dtype: int64
For all columns, which will create a new DataFrame:
pd.concat([
df[col].map(df[col].value_counts()) for col in df
], axis=1)
colA colB colC
0 3 2 2
1 3 2 2
2 3 2 3
3 2 1 3
4 2 2 3
Upvotes: 3
Reputation: 323226
You can do groupby
+ transform
df['new'] = df.groupby('colA')['colA'].transform('count')
Upvotes: 3