Reputation: 15
What is the best way to find the largest value after date of each row, for example i have this dataframe:
import pandas as pd
data = [[20200101, 10], [20200102, 16], [20200103, 14], [20200104, 18]]
df = pd.DataFrame(data, columns=['date', 'value'])
print(df)
date value
0 20200101 10
1 20200102 16
2 20200103 14
3 20200104 18
i need to get the first largest value date after each row date :
date value largest_value_date
0 20200101 10 20200102
1 20200102 16 20200104
2 20200103 14 20200104
3 20200104 18 0
of course i tried with "for" but in big data it's very slow:
df['largest_value_date'] = 0
for i in range(0, len(df)):
date = df['date'].iloc[i]
value = df['value'].iloc[i]
largestDate = df[(df['date'] > date) & (df['value'] > value)]
if len(largestDate) > 0:
df['largest_value_date'].iloc[i] = largestDate['date'].iloc[0]
print(df)
date value largest_value_date
0 20200101 10 20200102
1 20200102 16 20200104
2 20200103 14 20200104
3 20200104 18 0
Upvotes: 1
Views: 199
Reputation: 323236
We can speed up the whole process with numpy
board cast then idxmax
, get the most recent values' id
greater than the current row , then assign it back
s = df['value'].values
idx = pd.DataFrame(np.triu(s-s[:,None])).gt(0).idxmax(1)
df['new'] = df['date'].reindex(idx.replace(0,-1)).values
df
Out[158]:
date value new
0 20200101 10 20200102.0
1 20200102 16 20200104.0
2 20200103 14 20200104.0
3 20200104 18 NaN
Upvotes: 1