Python pandas find the largest value after date of each row

Question

What is the best way to find the largest value after date of each row, for example i have this dataframe:

import pandas as pd
data = [[20200101, 10], [20200102, 16], [20200103, 14], [20200104, 18]]
df = pd.DataFrame(data, columns=['date', 'value'])
print(df)

       date  value
0  20200101     10
1  20200102     16
2  20200103     14
3  20200104     18

i need to get the first largest value date after each row date :

   date  value           largest_value_date
0  20200101     10                 20200102
1  20200102     16                 20200104
2  20200103     14                 20200104
3  20200104     18                        0

of course i tried with "for" but in big data it's very slow:

df['largest_value_date'] = 0
for i in range(0, len(df)):
    date = df['date'].iloc[i]
    value = df['value'].iloc[i]
    largestDate = df[(df['date'] > date) & (df['value'] > value)]
    if len(largestDate) > 0:
        df['largest_value_date'].iloc[i] = largestDate['date'].iloc[0]
print(df)


      date  value  largest_value_date
0  20200101     10            20200102
1  20200102     16            20200104
2  20200103     14            20200104
3  20200104     18                   0

BENY · Accepted Answer

We can speed up the whole process with numpy board cast then idxmax , get the most recent values' id greater than the current row , then assign it back

s = df['value'].values
idx = pd.DataFrame(np.triu(s-s[:,None])).gt(0).idxmax(1)
df['new'] = df['date'].reindex(idx.replace(0,-1)).values
df
Out[158]: 
       date  value         new
0  20200101     10  20200102.0
1  20200102     16  20200104.0
2  20200103     14  20200104.0
3  20200104     18         NaN

Python pandas find the largest value after date of each row

Answers (1)

Related Questions