zxdawn
zxdawn

Reputation: 1015

Select rows by column value and include previous row by another column value

Here's an example of DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame([
    [0, "file_0", 5],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [1, "file_3", 8],
    [2, "file_4", 0],
    [2, "file_5", 5],
    [2, "file_6", 100],
    [2, "file_7", 0],
    [2, "file_8", 50]
], columns=["case", "filename", "num"])

I wanna select num==0 rows and their previous rows with the same case value, no matter the num value of the previous row.

Finally, we should get

case    filename    num
0   file_0  5
0   file_1  0
1   file_2  0
2   file_4  0
2   file_6  100
2   file_7  0

I have got that I can select the previous row by

df[(df['num']==0).shift(-1).fillna(False)]

However, this doesn't consider the case value. One solution that came to my mind is group by case first and then filter data. I have no idea how to code it ...

Upvotes: 1

Views: 230

Answers (3)

zxdawn
zxdawn

Reputation: 1015

I figure out the answer by myself:

# create boolean masks which are true when `num` is 0 and previous `case` is the same
mask = (df.case.eq(df.case.shift())) & (df['num']==0)

# concat previous rows and num==0 rows
df_res = pd.concat([df[mask.shift(-1).fillna(False)], df[df['num']==0]]).sort_values(['case', 'filename'])

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

Do you mean:

df.join(df.groupby('case').shift(-1)
                .loc[df['num']==0]
                .dropna(how='all').add_suffix('_next'), 
        how='inner')

Output:

   case filename  num filename_next  num_next
0     0   file_0    0        file_1       0.0
3     2   file_3    0        file_4     100.0
5     2   file_5    0        file_6      50.0

Upvotes: 0

Surjit Samra
Surjit Samra

Reputation: 4662

How about merging df ?

    df = pd.DataFrame([
    [0, "file_0", 0],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [2, "file_3", 0],
    [2, "file_4", 100],
    [2, "file_5", 0],
    [2, "file_6", 50],
    [2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]: 
   case_x filename  num_x  case_y  num_y
0       0   file_0      0       0      0
1       0   file_1      0       0      0
2       1   file_2      0       1      0
3       2   file_3      0       2      0
4       2   file_4    100       2    100
5       2   file_5      0       2      0
6       2   file_6     50       2     50
7       2   file_7      0       2      0

then you can rename columns back

df[['case_x', 'filename',  'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]: 
   case filename  num
0     0   file_0    0
1     0   file_1    0
2     1   file_2    0
3     2   file_3    0
4     2   file_4  100
5     2   file_5    0
6     2   file_6   50
7     2   file_7    0

Upvotes: 0

Related Questions