Reputation: 1167
I have a table that looks like this:
I am trying to figure out what the "first_stop" and "last_stop" is for each "id" based on the "sequence". I have the first_stop done, but I can't figure out the last stop. So I need an additional column that says if it is the max sequence number for each ID. I have tried so many ways and they all end up failing or giving me some errors. I've wasted too much of my day trying to solve this and I know you all will be able to help me out super quick. Thanks for your help.
for index, row in df.iterrows():
first_mask = df['sequence'] == 0
last_mask = df['sequence'] == ???
df['first_stop'] = np.where(first_mask, 'Y', 'N')
df['last_stop'] = np.where(last_mask, 'Y', 'N')
print(df[['id', 'sequence', 'first_stop', 'last_stop']])
The sequence is the order of the stops, so I want the lowest sequence number (first_stop) for each ID, and the highest sequence number (last_stop) for each ID. The lowest sequence number is easy, it's always 0. But how do I calculate the highest sequence number for each ID?
Upvotes: 0
Views: 727
Reputation: 59529
Given that your DataFrame
appears to already be sorted by ['id', 'sequence'] you just need to shift as was pointed out in the comments to check for the last row where the id changes.
df['last_stop'] = df.id != df.id.shift(-1)
Outputs:
id sequence last_stop
0 2 0 False
1 2 1 True
2 3 0 False
3 3 1 False
4 3 2 False
5 3 3 True
6 4 0 False
7 4 1 True
8 5 0 False
9 5 1 True
10 6 0 True
11 7 0 True
12 8 0 False
13 8 1 True
If it is not sorted, you can sort before and apply the same logic. Otherwise a single transform will take care of it:
df['last_stop'] = df.groupby('id').sequence.transform(lambda x: x == max(x))
Upvotes: 1
Reputation: 1167
Alright, I solved it myself, so here is what I did. It's pretty simple and I feel stupid for not figuring it out earlier.
Let me know if anyone else has a better solution.
I created a row that counts the total stops:
df['stop_count'] = df.groupby('id')['id'].transform('count') - 1
Then I checked to see if the sequence number matches the stop_count:
last_mask = df['sequence'] == df['stop_count']
Upvotes: 1