pandas add an order column based on grouping

Question

Let's say there is a dataframe with two columns, where col1 signifies groups.

d = pd.DataFrame({'col1': ['a','a','a','a', 'a', 'b','b'], 'col2': ['nmh','ghb','dfe', 'dfe', 'kil', 'gtr','klm']})

I want to add a third column, which uses the groups in col1, and the entries in col2, and adds a linear order, like below:

order = [1,2,3,3,4, 1,2]
d['order'] = order
d

col2 will be mostly unique, if anything is repeating order column should repeat order number.

I have used groupby and rank to no avail. Normally providing method='first' to rank method should solve the problem, but gives an error.

Note: The df will be much larger with different number of entries corresponding to each group in col1. So please provide a generalizable answer.

BENY · Accepted Answer

Using factorize

d['Order']=d.groupby('col1').col2.transform(lambda x : pd.factorize(x)[0]+1)
d
Out[1641]: 
  col1 col2  Order
0    a  nmh      1
1    a  ghb      2
2    a  dfe      3
3    a  dfe      3
4    a  kil      4
5    b  gtr      1
6    b  klm      2

pandas add an order column based on grouping

Answers (2)

Related Questions