Reputation: 4482
I have a dataframe that looks like this:
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,0,1,0,1,0,1]})
This df
is already order with respect to id
& time
.
I would like to keep only the rows where taken
has value 1
by id
, and if there are two (or more) consequent 1
s in the column taken
to keep only the first of these.
My output df
should like this:
id time taken
1 1 2 1
4 1 5 1
6 2 7 1
8 2 9 1
Second example
To make the question a bit more clear, if the dataframe looks like this:
df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,1,1,1,1,0,1]})
Then the output dataframe should look like this:
id time taken
1 1 2 1
5 2 4 1
8 2 9 1
Any ideas on how to do this ?
Upvotes: 2
Views: 338
Reputation: 8768
Here is an option:
df.loc[df.groupby('id')['taken'].transform(lambda x: x.diff().ne(0)) & df['taken'].eq(1)]
or
df.loc[df.groupby('id')['taken'].diff().ne(0) & df['taken'].eq(1)]
Output:
id time taken
1 1 2 1
4 1 5 1
6 2 7 1
8 2 9 1
Upvotes: 0
Reputation: 109526
Create a boolean mask by grouping by id
, and then taking the values where taken
is True but its shifted value is not True.
df1 = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,0,1,0,1,0,1]})
df2 = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2], 'time': [1,2,3,4,5,4,7,8,9], 'taken': [0,1,1,1,1,1,1,0,1]})
df = df1 # First example.
>>> df[df.groupby('id')['taken'].shift().ne(True) & df['taken']]
id time taken
1 1 2 1
4 1 5 1
6 2 7 1
8 2 9 1
df = df2 # Second example.
>>> df[df.groupby('id')['taken'].shift().ne(True) & df['taken']]
id time taken
1 1 2 1
5 2 4 1
8 2 9 1
Upvotes: 3
Reputation: 88226
Here's one approach:
taken_is_one = df.taken.eq(1)
df[taken_is_one & (~taken_is_one.shift(-1, fill_value=False)
| df.groupby('id').id.shift(1).isna())]
id time taken
1 1 2 1
5 2 4 1
8 2 9 1
Where with:
~taken_is_one.shift(-1, fill_value=False)
0 True
1 True
2 False
3 False
4 True
5 False
6 True
7 False
8 True
We will have True
only if the next value in taken
was not a 1, hence by combining with a &
we can index on both conditions being True
.
Upvotes: 2
Reputation: 29635
You can do it with several mask. First check where the value is 1 in taken. Then you need to check either: still the same id and difference with the previous row is not 0 or not same idea:
print (df[df.taken.eq(1) & (df.id.diff().eq(0) & df.taken.diff().ne(0) | df.id.diff().ne(0))])
id time taken
1 1 2 1
4 1 5 1
6 2 7 1
8 2 9 1
Upvotes: 1