Reputation: 5781
Lets suppose I have such df:
| id| date | target_row|
| 1| 2016-01-01 | 0|
| 1| 2016-02-01 | 0|
| 1| 2016-03-01| 0|
| 1 | 2016-04-01| 0|
| 1| 2016-05-01| 1|
| 1| 2016-06-01| 0|
| 1| 2016-07-01| 0|
| 1| 2016-08-01| 0|
My task is to check if any next 4 target_row values is 1 - including current, is so its should be one. (groupby('id') is needed
).
So desired output this: | id| date | target_row|next_6_target
| 1| 2016-01-01 | 0| 0
| 1| 2016-02-01 | 0| 1
| 1| 2016-03-01| 0| 1
| 1 | 2016-04-01| 0| 1
| 1| 2016-05-01| 1| 1
| 1| 2016-06-01| 0| 0
| 1| 2016-07-01| 0| 0
| 1| 2016-08-01| 0| 0
I tried this way:
df['next_6_target'] = df.groupby('id').rolling(window=6)[['target_row']].max().reset_index(drop=True)
and then performing shifting - but it gives wrong results
In SQL solution should be something like this:
MAX(target_row) OVER (PARTITION BY ID ORDER BY DATE ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING)
Upvotes: 0
Views: 376
Reputation: 734
It would be nice to see were exactly you had problems in accessing/calling the pandas dataframe.
Here is possible solution based on rolling(win)
and max()
:
import pandas as pd
d = dict(row = [0,0,0,0,0,0,0,1,0,0,0,0,0,0])
df = pd.DataFrame(d)
win = 6
df['winMax'] = df.rolling(win).max().shift(-win+1)
print(df)
>
row winMax
0 0 0.0
1 0 0.0
2 0 1.0
3 0 1.0
4 0 1.0
5 0 1.0
6 0 1.0
7 1 1.0
8 0 0.0
9 0 NaN
10 0 NaN
11 0 NaN
12 0 NaN
13 0 NaN
Upvotes: 1
Reputation: 5781
Possible solution by pandasql. Is there a way to avoid using it?
from pandasql import sqldf
df_final = sqldf("""SELECT id, date, target_row,
MAX(target_row) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING)
FROM df
""")
Upvotes: 0