Reputation: 629
Let's say there is a dataframe with two columns, where col1
signifies groups.
d = pd.DataFrame({'col1': ['a','a','a','a', 'a', 'b','b'], 'col2': ['nmh','ghb','dfe', 'dfe', 'kil', 'gtr','klm']})
I want to add a third column, which uses the groups in col1
, and the entries in col2
, and adds a linear order, like below:
order = [1,2,3,3,4, 1,2]
d['order'] = order
d
col2
will be mostly unique, if anything is repeating order
column should repeat order number.
I have used groupby
and rank
to no avail. Normally providing method='first'
to rank
method should solve the problem, but gives an error.
Note: The df will be much larger with different number of entries corresponding to each group in col1
. So please provide a generalizable answer.
Upvotes: 3
Views: 2229
Reputation: 210832
In [45]: d['order'] = (d.groupby('col1')['col2']
.transform(lambda x: (x!=x.shift()).cumsum()))
In [46]: d
Out[46]:
col1 col2 order
0 a nmh 1
1 a ghb 2
2 a dfe 3
3 a dfe 3
4 a kil 4
5 b gtr 1
6 b klm 2
or a nicer alternative from @Zero:
In [52]: d.col2.ne(d.col2.shift()).groupby(d.col1).cumsum()
Out[52]:
0 1.0
1 2.0
2 3.0
3 3.0
4 4.0
5 1.0
6 2.0
Name: col2, dtype: float64
Upvotes: 6
Reputation: 323226
Using factorize
d['Order']=d.groupby('col1').col2.transform(lambda x : pd.factorize(x)[0]+1)
d
Out[1641]:
col1 col2 Order
0 a nmh 1
1 a ghb 2
2 a dfe 3
3 a dfe 3
4 a kil 4
5 b gtr 1
6 b klm 2
Upvotes: 3