quant
quant

Reputation: 4482

How to choose the rows where the first occurrence of a value happens with groupby in pandas and also the first of the consequent occurrences

I have a dataframe that looks like this:

import pandas as pd

df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,0,1,0,1,0,1]})

This df is already order with respect to id & time.

I would like to keep only the rows where taken has value 1 by id, and if there are two (or more) consequent 1s in the column taken to keep only the first of these.

My output df should like this:

  id  time  taken
1   1     2      1
4   1     5      1
6   2     7      1
8   2     9      1

Second example

To make the question a bit more clear, if the dataframe looks like this:

df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,1,1,1,1,0,1]})

Then the output dataframe should look like this:

   id  time  taken
1   1     2      1
5   2     4      1
8   2     9      1

Any ideas on how to do this ?

Upvotes: 2

Views: 338

Answers (4)

rhug123
rhug123

Reputation: 8768

Here is an option:

df.loc[df.groupby('id')['taken'].transform(lambda x: x.diff().ne(0)) & df['taken'].eq(1)]

or

df.loc[df.groupby('id')['taken'].diff().ne(0) & df['taken'].eq(1)]

Output:

   id  time  taken
1   1     2      1
4   1     5      1
6   2     7      1
8   2     9      1

Upvotes: 0

Alexander
Alexander

Reputation: 109526

Create a boolean mask by grouping by id, and then taking the values where taken is True but its shifted value is not True.

df1 = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,0,1,0,1,0,1]})
df2 = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,1,1,1,1,0,1]}) 

df = df1  # First example.
>>> df[df.groupby('id')['taken'].shift().ne(True) & df['taken']]
   id  time  taken
1   1     2      1
4   1     5      1
6   2     7      1
8   2     9      1

df = df2  # Second example.
>>> df[df.groupby('id')['taken'].shift().ne(True) & df['taken']]
   id  time  taken
1   1     2      1
5   2     4      1
8   2     9      1

Upvotes: 3

yatu
yatu

Reputation: 88226

Here's one approach:

taken_is_one = df.taken.eq(1)
df[taken_is_one & (~taken_is_one.shift(-1, fill_value=False) 
                | df.groupby('id').id.shift(1).isna())]

   id  time  taken
1   1     2      1
5   2     4      1
8   2     9      1

Where with:

~taken_is_one.shift(-1, fill_value=False)

0     True
1     True
2    False
3    False
4     True
5    False
6     True
7    False
8     True

We will have True only if the next value in taken was not a 1, hence by combining with a & we can index on both conditions being True.

Upvotes: 2

Ben.T
Ben.T

Reputation: 29635

You can do it with several mask. First check where the value is 1 in taken. Then you need to check either: still the same id and difference with the previous row is not 0 or not same idea:

print (df[df.taken.eq(1) & (df.id.diff().eq(0) & df.taken.diff().ne(0) | df.id.diff().ne(0))])
   id  time  taken
1   1     2      1
4   1     5      1
6   2     7      1
8   2     9      1

Upvotes: 1

Related Questions