Reputation: 25
I have 2 columns, 1st: Cluster, 2nd: Vehicle
Cluster VehicleID
435 1 2
264 1 1
444 1 1
302 1 1
191 1 1
383 1 1
81 1 1
142 2 1
6 2 1
420 2 1
153 2 1
42 2 2
168 2 1
292 2 2
138 2 2
65 2 2
316 2 1
219 2 1
329 2 1
371 3 1
Basically, this tells cluster 1 has two vehicles:[1,2] and cluster has 1 vehicle. The above table is a small sample. So, I have cluster 1:[1,2], cluster 2:[1] What I want is the cluster 1's vehicle 1 needs to be mapped as 1, 2 as 2. But, cluster 2's vehicle 1 should be mapped as 3.
In short, they should be sequential and irrespective of the "Cluster" column.
I am not able to know where am I going wrong. Kindly help.
Upvotes: 2
Views: 346
Reputation: 59274
You may begin with sorting such that you can take advantage of diff
ing to find when IDs change, and then use cumsum
to retrieve your cumulative IDs.
initial = df.index
df = df.sort_values(['Cluster', 'VehicleID'])
df['new-ID'] = (df.VehicleID.diff().ne(0) | df.Cluster.diff().eq(1)).cumsum()
df.loc[initial] # back to initial ordering
Cluster VehicleID new-ID
435 1 2 2
264 1 1 1
444 1 1 1
302 1 1 1
191 1 1 1
383 1 1 1
81 1 1 1
142 2 1 3
6 2 1 3
420 2 1 3
153 2 1 3
42 2 2 4
168 2 1 3
292 2 2 4
138 2 2 4
65 2 2 4
316 2 1 3
219 2 1 3
329 2 1 3
371 3 1 5
Upvotes: 2