Rocketq
Rocketq

Reputation: 5781

How make rolling windows iterate from future (following) window in pandas?

Lets suppose I have such df:

| id| date | target_row|

| 1| 2016-01-01 | 0|

| 1| 2016-02-01 | 0|

| 1| 2016-03-01| 0|

| 1 | 2016-04-01| 0|

| 1| 2016-05-01| 1|

| 1| 2016-06-01| 0|

| 1| 2016-07-01| 0|

| 1| 2016-08-01| 0|

My task is to check if any next 4 target_row values is 1 - including current, is so its should be one. (groupby('id') is needed).

So desired output this: | id| date | target_row|next_6_target

| 1| 2016-01-01 | 0| 0

| 1| 2016-02-01 | 0| 1

| 1| 2016-03-01| 0| 1

| 1 | 2016-04-01| 0| 1

| 1| 2016-05-01| 1| 1

| 1| 2016-06-01| 0| 0

| 1| 2016-07-01| 0| 0

| 1| 2016-08-01| 0| 0

I tried this way: df['next_6_target'] = df.groupby('id').rolling(window=6)[['target_row']].max().reset_index(drop=True) and then performing shifting - but it gives wrong results

In SQL solution should be something like this:

MAX(target_row) OVER (PARTITION BY ID ORDER BY DATE ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING)

Upvotes: 0

Views: 376

Answers (2)

Marc
Marc

Reputation: 734

It would be nice to see were exactly you had problems in accessing/calling the pandas dataframe.

Here is possible solution based on rolling(win) and max():

import pandas as pd
d  = dict(row = [0,0,0,0,0,0,0,1,0,0,0,0,0,0])
df = pd.DataFrame(d)

win = 6
df['winMax'] = df.rolling(win).max().shift(-win+1)
print(df)

>
    row  winMax
0     0     0.0
1     0     0.0
2     0     1.0
3     0     1.0
4     0     1.0
5     0     1.0
6     0     1.0
7     1     1.0
8     0     0.0
9     0     NaN
10    0     NaN
11    0     NaN
12    0     NaN
13    0     NaN

Upvotes: 1

Rocketq
Rocketq

Reputation: 5781

Possible solution by pandasql. Is there a way to avoid using it?

from pandasql import sqldf
df_final = sqldf("""SELECT id, date, target_row, 
MAX(target_row) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING)
FROM df
      """)

Upvotes: 0

Related Questions