The Rookie
The Rookie

Reputation: 939

How do I create a function to perform label encoding

I have the dataframe -

df = pd.DataFrame({'colA':['a', 'a', 'a', 'b' ,'b'], 'colB':['a', 'b', 'a', 'c', 'b'], 'colC':['x', 'x', 'y', 'y', 'y']})

I would like to write a function to replace each value with it's frequency count in that column. For example colA will now be [3, 3, 3, 2, 2]

I have attempted to do this by creating a dictionary with the value and the frequency count, assign that dictionary to a variable freq, then map the column values to freq. I have written the following function

def LabelEncode_method1(col): 
   freq = col.value_counts().to_dict()
   col = col.map(freq)
   return col.head()```

When I run the following LabelEncode_method1(df.colA), I get the result 3, 3, 3, 2, 2. However when I call the dataframe df, the values for colA are still 'a', 'a', 'a', 'b', 'b'

  1. What am I doing wrong. How do I fix my function?
  2. How do I write another function that loops through all columns and maps the values to freq, as opposed to calling the function 3 separate times for each column.

Upvotes: 4

Views: 249

Answers (2)

user3483203
user3483203

Reputation: 51165

You can use map + value_counts (Which you have already found, you just need to assign the result back to your DataFrame).

df['colA'].map(df['colA'].value_counts())

0    3
1    3
2    3
3    2
4    2
Name: colA, dtype: int64

For all columns, which will create a new DataFrame:

pd.concat([
  df[col].map(df[col].value_counts()) for col in df
], axis=1)

   colA  colB  colC
0     3     2     2
1     3     2     2
2     3     2     3
3     2     1     3
4     2     2     3

Upvotes: 3

BENY
BENY

Reputation: 323226

You can do groupby + transform

df['new'] = df.groupby('colA')['colA'].transform('count')

Upvotes: 3

Related Questions