rohit deraj
rohit deraj

Reputation: 95

Label encode subgroups after groupby

I want to label encode subgroups in a pandas dataframe. Something like this:

| Category   | | Name      |
| ---------- | | --------- | 
| FRUITS     | | Apple     |
| FRUITS     | | Orange    |
| FRUITS     | | Apple     |
| Vegetables | | Onion     |
| Vegetables | | Garlic    |
| Vegetables | | Garlic    |  

to

| Category   | | Name    | | Label |
| ---------- | | ------- | | ----- |
| FRUITS     | | Apple   | | 1     |
| FRUITS     | | Orange  | | 2     |
| FRUITS     | | Apple   | | 1     |
| Vegetables | | Onion   | | 1     |
| Vegetables | | Garlic  | | 2     |
| Vegetables | | Garlic  | | 2     |

Upvotes: 2

Views: 266

Answers (2)

mozway
mozway

Reputation: 262164

You can use factorize per group:

df['Label'] = (df.groupby('Category')['Name']
               .transform(lambda x: pd.factorize(x)[0])
               .add(1)
               )

Output:

     Category    Name  Label
0      FRUITS   Apple      1
1      FRUITS  Orange      2
2      FRUITS   Apple      1
3  Vegetables   Onion      1
4  Vegetables  Garlic      2
5  Vegetables  Garlic      2

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195553

Try to group-by "Category" and then group-by "Name" and use .ngroup():

df["Label"] = (
    df.groupby("Category")
    .apply(lambda x: x.groupby("Name", sort=False).ngroup() + 1)
    .values
)
print(df)

Prints:

     Category    Name  Label
0      FRUITS   Apple      1
1      FRUITS  Orange      2
2      FRUITS   Apple      1
3  Vegetables   Onion      1
4  Vegetables  Garlic      2
5  Vegetables  Garlic      2

Upvotes: 1

Related Questions