Emac
Emac

Reputation: 1167

Python Pandas max value of a column depending on shared condition

I have a table that looks like this:

My table

I am trying to figure out what the "first_stop" and "last_stop" is for each "id" based on the "sequence". I have the first_stop done, but I can't figure out the last stop. So I need an additional column that says if it is the max sequence number for each ID. I have tried so many ways and they all end up failing or giving me some errors. I've wasted too much of my day trying to solve this and I know you all will be able to help me out super quick. Thanks for your help.

for index, row in df.iterrows():
    first_mask = df['sequence'] == 0
    last_mask = df['sequence'] == ???
    df['first_stop'] = np.where(first_mask, 'Y', 'N')
    df['last_stop'] = np.where(last_mask, 'Y', 'N')
print(df[['id', 'sequence', 'first_stop', 'last_stop']])

The sequence is the order of the stops, so I want the lowest sequence number (first_stop) for each ID, and the highest sequence number (last_stop) for each ID. The lowest sequence number is easy, it's always 0. But how do I calculate the highest sequence number for each ID?

Upvotes: 0

Views: 727

Answers (2)

ALollz
ALollz

Reputation: 59529

Given that your DataFrame appears to already be sorted by ['id', 'sequence'] you just need to shift as was pointed out in the comments to check for the last row where the id changes.

df['last_stop'] = df.id != df.id.shift(-1)

Outputs:

    id  sequence  last_stop
0    2         0      False
1    2         1       True
2    3         0      False
3    3         1      False
4    3         2      False
5    3         3       True
6    4         0      False
7    4         1       True
8    5         0      False
9    5         1       True
10   6         0       True
11   7         0       True
12   8         0      False
13   8         1       True

If it is not sorted, you can sort before and apply the same logic. Otherwise a single transform will take care of it:

df['last_stop'] = df.groupby('id').sequence.transform(lambda x: x == max(x))

Upvotes: 1

Emac
Emac

Reputation: 1167

Alright, I solved it myself, so here is what I did. It's pretty simple and I feel stupid for not figuring it out earlier.

Let me know if anyone else has a better solution.

I created a row that counts the total stops:

df['stop_count'] = df.groupby('id')['id'].transform('count') - 1

Then I checked to see if the sequence number matches the stop_count:

last_mask = df['sequence'] == df['stop_count']

Upvotes: 1

Related Questions