Reputation: 21625
R's data.table
package has a really convenient .GRP
method for generating group index values.
library(data.table)
dt <- data.table(
Grp=c("a", "z", "a", "f", "f"),
Val=c(3, 2, 1, 2, 2)
)
dt[, GrpIdx := .GRP, by=Grp]
Grp Val GrpIdx
1: a 3 1
2: z 2 2
3: a 1 1
4: f 2 3
5: f 2 3
What's the best way to accomplish the same thing using pandas
?
import pandas as pd
df = pd.DataFrame({'Grp':["a", "z", "a", "f", "f"], 'Val':[3, 2, 1, 2, 2]})
Upvotes: 2
Views: 5805
Reputation: 1192
With Pandas >= 1.1 you can use groupby.ngroup()
.
In your example:
In [39]: df['GrpIdx'] = df.groupby(['Grp']).ngroup()
In [40]: df
Out[40]:
Grp Val Grpidx
0 a 3 0
1 z 2 2
2 a 1 0
3 f 2 1
4 f 2 1
Upvotes: 3
Reputation: 29711
You could use rank
to identify unique groups with the method
arg set to dense
which accepts string
values:
df['GrpIdx'] = df['Grp'].rank(method='dense').astype(int)
Upvotes: 2