Reputation: 4787
I have a dataset that looks like this:
OwnerID GroupID AssignmentID ... <few more columns> [Need this column]
1 10 100 1
1 10 100 1
1 10 200 2
1 20 100 1
1 20 200 2
1 20 300 3
2 30 200 1
2 30 200 1
2 40 300 2
I would like to populate a column based on the values in OwnerID
, GroupID
and AssignmentID
fields. If the values in these fields are the same across rows, then I want the number 1
repeated in the new column. But if the same owner has assigned a different assignment to the same group, then the value in the new column should increment.
For example - OwnerID
1
assigned 2 assignments (2 with the same AssignmentID
100 and the other with AssignmentID
200). The AssignmentID
100 gets value 1
in both cases because the values for OwnerID
, GroupID
and AssignmentID
are the same, but gets the value 2 when the AssignmentID
is 200.
Similarly, when OwnerID
100 assigned AssignmentID
100, 200 and 300, the group that these assignments were assigned to, had changed to 20.
I thought this could be done using the following code:
AssignmentDetails['colname'] = AssignmentDetails.groupby(['ownerid','groupid','assignmentid']).cumcount()
But this doesn't give me the required result. It doesn't repeat the value in the new column when the values in the 'groupby' clause are the same, but it increments the values.
How do I achieve this? Any help would be great.
Upvotes: 2
Views: 139
Reputation: 323226
Or
df.groupby([ 'OwnerID' , 'GroupID' ]).AssignmentID.transform(lambda x: x.astype('category').cat.codes.add(1))
Out[186]:
0 1
1 1
2 2
3 1
4 2
5 3
6 1
7 1
8 1
Name: AssignmentID, dtype: int8
Upvotes: 1
Reputation: 294258
df.assign(
result=df.groupby(
['OwnerID', 'GroupID']
).AssignmentID.transform(lambda x: x.factorize()[0]) + 1
)
OwnerID GroupID AssignmentID Result result
0 1 10 100 1 1
1 1 10 100 1 1
2 1 10 200 2 2
3 1 20 100 1 1
4 1 20 200 1 2
5 1 20 300 1 3
6 2 30 200 1 1
7 2 30 200 1 1
8 2 40 300 2 1
Upvotes: 2