jonboy
jonboy

Reputation: 368

Use index values to copy rows as new columns - pandas

I'm returning indexed values using a specific condition. I'm then aiming to use those values and copy specific rows to new columns.

With the df below, I create a mask of all rows where Item == X. Using the specific int in Value, I then return the next valid index where that int >= 2. I've labelled this index next_high.

I want to use next_high and copy Datetime and Val as new columns but I'm getting an error.

d = ({
    'Datetime' : ['2022-07-25 01:46:00+01:00','2022-07-25 01:47:00+01:00','2022-07-25 01:48:00+01:00','2022-07-25 01:49:00+01:00','2022-07-25 01:50:00+01:00','2022-07-25 01:51:00+01:00','2022-07-25 01:52:00+01:00','2022-07-25 01:53:00+01:00'],                 
    'Val' : [5,1,4,-2,8,4,3,10],   
    'Item' : ['X', np.NaN, np.NaN, np.NaN, 'X', np.NaN, 'X', np.NaN],                      
})

df = pd.DataFrame(data = d)


mask = df['Item'] == 'X'

next_high = np.where(mask, df.apply(lambda row: (df[df.index>=row.name].Val - row.Val)
                    .where(lambda v: v >= 2).first_valid_index(), axis=1), np.NaN)


df.loc[df.index[next_high], 'Val_High'] = df['Val']


df.loc[df.index[next_high], 'Datetime_High'] = df['Datetime']

intended output:

                    Datetime  Val Item              Datetime_High  Val_High
0  2022-07-25 01:46:00+01:00    5    X  2022-07-25 01:50:00+01:00       8.0 # Next Val +2 above 5 is row 4
1  2022-07-25 01:47:00+01:00    1  NaN                        NaN       NaN
2  2022-07-25 01:48:00+01:00    4  NaN                        NaN       NaN
3  2022-07-25 01:49:00+01:00   -2  NaN                        NaN       NaN
4  2022-07-25 01:50:00+01:00    8    X  2022-07-25 01:53:00+01:00      10.0 # Next Val +2 above 8 row 7
5  2022-07-25 01:51:00+01:00    4  NaN                        NaN       NaN
6  2022-07-25 01:52:00+01:00    3    X  2022-07-25 01:53:00+01:00      10.0 # Next Val +2 above 3 row 7
7  2022-07-25 01:53:00+01:00   10  NaN                        NaN       NaN

Upvotes: 1

Views: 72

Answers (1)

Corralien
Corralien

Reputation: 120429

You can use:

df1 = (df[mask].reset_index()
               .merge(df[['Datetime', 'Val']], how='cross', suffixes=(None, '_High'))
               .query('(Val + 2 <= Val_High) & (Datetime < Datetime_High)')
               .drop_duplicates('Datetime').set_index('index').filter(like='_High'))
df = pd.concat([df, df1], axis=1)

Output:

>>> df
                    Datetime  Val Item              Datetime_High  Val_High
0  2022-07-25 01:46:00+01:00    5    X  2022-07-25 01:50:00+01:00       8.0
1  2022-07-25 01:47:00+01:00    1  NaN                        NaN       NaN
2  2022-07-25 01:48:00+01:00    4  NaN                        NaN       NaN
3  2022-07-25 01:49:00+01:00   -2  NaN                        NaN       NaN
4  2022-07-25 01:50:00+01:00    8    X  2022-07-25 01:53:00+01:00      10.0
5  2022-07-25 01:51:00+01:00    4  NaN                        NaN       NaN
6  2022-07-25 01:52:00+01:00    3    X  2022-07-25 01:53:00+01:00      10.0
7  2022-07-25 01:53:00+01:00   10  NaN                        NaN       NaN

Upvotes: 1

Related Questions