cs_stackX
cs_stackX

Reputation: 1527

Python (Pandas) updating previous x rows within specified condition

I have data about machine failures. The data is in a pandas dataframe with date, id, failure and previous_30_days columns. The previous_30_days column is currently all zeros. My desired outcome is to populate rows in the previous_30_days column with a '1' if they occur within a 30 day time-span before a failure. I am currently able to do this with the following code:

failure_df = df[(df['failure'] == 1)] # create a dataframe of just failures

for index, row in failure_df.iterrows():
    df.loc[(df['date'] >= (row.date - datetime.timedelta(days=30))) &
           (df['date'] <= row.date) & (df['id'] == row.id), 'previous_30_days'] = 1

Note that I also check for the id match, because dates are repeated in the dataframe, so I cannot assume it is simply the previous 30 rows.

My code works, but the problem is that the dataframe is millions of rows, and this code is too slow at the moment.

Is there a more efficient way to achieve the desired outcome? Any thoughts would be very much appreciated.

Upvotes: 1

Views: 77

Answers (1)

JohnE
JohnE

Reputation: 30424

I'm a little confused about how your code works (or is supposed to work), but this ought to point you in the right direction and can be easily adapted. It will be much faster by avoiding iterrows in favor of vectorized operations (about 7x faster for this small dataframe, it should be a much bigger improvement on your large dataframe).

np.random.seed(123)
df=pd.DataFrame({ 'date':np.random.choice(pd.date_range('2015-1-1',periods=300),20), 
                  'id':np.random.randint(1,4,20) })
df=df.sort(['id','date'])

Now, calculate days between current and previous date (by id).

df['since_last'] = df.groupby('id')['date'].apply( lambda x: x - x.shift() )

Then create your new column based on the number of days to the previous date.

df['previous_30_days'] = df['since_last'] < datetime.timedelta(days=30)

         date  id  since_last previous_30_days
12 2015-02-17   1         NaT            False
6  2015-02-27   1     10 days             True
3  2015-03-25   1     26 days             True
0  2015-04-09   1     15 days             True
10 2015-04-24   1     15 days             True
5  2015-05-04   1     10 days             True
11 2015-05-07   1      3 days             True
8  2015-08-14   1     99 days            False
14 2015-02-02   2         NaT            False
9  2015-04-07   2     64 days            False
19 2015-07-28   2    112 days            False
7  2015-08-03   2      6 days             True
15 2015-08-13   2     10 days             True
1  2015-08-19   2      6 days             True
2  2015-01-18   3         NaT            False
13 2015-03-15   3     56 days            False
18 2015-04-07   3     23 days             True
4  2015-04-17   3     10 days             True
16 2015-04-22   3      5 days             True
17 2015-09-11   3    142 days            False

Upvotes: 1

Related Questions