Reputation: 1015
Here's an example of DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame([
[0, "file_0", 5],
[0, "file_1", 0],
[1, "file_2", 0],
[1, "file_3", 8],
[2, "file_4", 0],
[2, "file_5", 5],
[2, "file_6", 100],
[2, "file_7", 0],
[2, "file_8", 50]
], columns=["case", "filename", "num"])
I wanna select num==0
rows and their previous rows with the same case
value, no matter the num
value of the previous row.
Finally, we should get
case filename num
0 file_0 5
0 file_1 0
1 file_2 0
2 file_4 0
2 file_6 100
2 file_7 0
I have got that I can select the previous row by
df[(df['num']==0).shift(-1).fillna(False)]
However, this doesn't consider the case
value. One solution that came to my mind is group by case
first and then filter data. I have no idea how to code it ...
Upvotes: 1
Views: 230
Reputation: 1015
I figure out the answer by myself:
# create boolean masks which are true when `num` is 0 and previous `case` is the same
mask = (df.case.eq(df.case.shift())) & (df['num']==0)
# concat previous rows and num==0 rows
df_res = pd.concat([df[mask.shift(-1).fillna(False)], df[df['num']==0]]).sort_values(['case', 'filename'])
Upvotes: 1
Reputation: 150785
Do you mean:
df.join(df.groupby('case').shift(-1)
.loc[df['num']==0]
.dropna(how='all').add_suffix('_next'),
how='inner')
Output:
case filename num filename_next num_next
0 0 file_0 0 file_1 0.0
3 2 file_3 0 file_4 100.0
5 2 file_5 0 file_6 50.0
Upvotes: 0
Reputation: 4662
How about merging df ?
df = pd.DataFrame([
[0, "file_0", 0],
[0, "file_1", 0],
[1, "file_2", 0],
[2, "file_3", 0],
[2, "file_4", 100],
[2, "file_5", 0],
[2, "file_6", 50],
[2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]:
case_x filename num_x case_y num_y
0 0 file_0 0 0 0
1 0 file_1 0 0 0
2 1 file_2 0 1 0
3 2 file_3 0 2 0
4 2 file_4 100 2 100
5 2 file_5 0 2 0
6 2 file_6 50 2 50
7 2 file_7 0 2 0
then you can rename columns back
df[['case_x', 'filename', 'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]:
case filename num
0 0 file_0 0
1 0 file_1 0
2 1 file_2 0
3 2 file_3 0
4 2 file_4 100
5 2 file_5 0
6 2 file_6 50
7 2 file_7 0
Upvotes: 0