Ben
Ben

Reputation: 21625

How to make a group id using pandas

R's data.table package has a really convenient .GRP method for generating group index values.

library(data.table)
dt <- data.table(
  Grp=c("a", "z", "a", "f", "f"),
  Val=c(3, 2, 1, 2, 2)
)
dt[, GrpIdx := .GRP, by=Grp]

   Grp Val GrpIdx
1:   a   3      1
2:   z   2      2
3:   a   1      1
4:   f   2      3
5:   f   2      3

What's the best way to accomplish the same thing using pandas?

import pandas as pd
df = pd.DataFrame({'Grp':["a", "z", "a", "f", "f"], 'Val':[3, 2, 1, 2, 2]})

Upvotes: 2

Views: 5805

Answers (2)

svenkatesh
svenkatesh

Reputation: 1192

With Pandas >= 1.1 you can use groupby.ngroup().

In your example:

In [39]: df['GrpIdx'] = df.groupby(['Grp']).ngroup()    

In [40]: df                                                                                   
Out[40]: 
  Grp  Val  Grpidx
0   a    3       0
1   z    2       2
2   a    1       0
3   f    2       1
4   f    2       1

Upvotes: 3

Nickil Maveli
Nickil Maveli

Reputation: 29711

You could use rank to identify unique groups with the method arg set to dense which accepts string values:

df['GrpIdx'] = df['Grp'].rank(method='dense').astype(int)

Image

Upvotes: 2

Related Questions