Stacey
Stacey

Reputation: 5097

Using a where statement to update a data-frame

I have a data-frame (currentQuarter) which looks like

  CODE  SIP  ...                 STATUS  STATUS_CHANGE_DATE
0   AA  YES  ...                CURRENT          2019-01-01
1   AA  YES  ...                 LEAVER          2019-02-05
2   AA  YES  ...                 LEAVER          2019-09-03
3   AA  YES  ...                 JOINER          2019-01-05
4   AA  YES  ...                CURRENT          2019-03-08

I am trying to change the STATUS of any record to CURRENT where the STATUS is not CURRENT and the STATUS_CHANGE_DATE is in the future. So in this example the referencePeriodStartDate is 2019-07-25. Which would mean the third records STATUS field would change to CURRENT:

  CODE  SIP  ...                 STATUS  STATUS_CHANGE_DATE
0   AA  YES  ...                CURRENT          2019-01-01
1   AA  YES  ...                 LEAVER          2019-02-05
2   AA  YES  ...                CURRENT          2019-09-03
3   AA  YES  ...                 JOINER          2019-01-05
4   AA  YES  ...                CURRENT          2019-03-08

I can filter the df to create the query to find the records however I can't get the where statement to change the affected records:

referencePeriodStartDate = datetime.datetime.strptime(referenceDate , '%d/%m/%Y')
    not_current = currentQuarter['STATUS'].isin(leaverList) 
    currentQuarter['STATUS_CHANGE_DATE'] = pd.to_datetime(currentQuarter['STATUS_CHANGE_DATE'], errors='coerce')
    currentQuarter['STATUS_CHANGE_DATE'].dt.strftime('%d/%m/%Y') 
    referenceError = currentQuarter['STATUS_CHANGE_DATE'] > referencePeriodStartDate


currentQuarter['STATUS'] = np.where(not_current & referenceError, "CURRENT", currentQuarter['STATUS'])

Upvotes: 0

Views: 38

Answers (1)

yatu
yatu

Reputation: 88226

You could check which dates in STATUS_CHANGE_DATE are greater than today's, and use the result to index the dataframe and set the STATUS to CURRRENT accordingly:

df['STATUS_CHANGE_DATE'] = pd.to_datetime(df.STATUS_CHANGE_DATE)
df.loc[df.STATUS_CHANGE_DATE.gt(referencePeriodStartDate), 'STATUS'] = 'CURRENT'

   CODE  SIP   STATUS STATUS_CHANGE_DATE
0   AA  YES  CURRENT         2019-01-01
1   AA  YES   LEAVER         2019-02-05
2   AA  YES  CURRENT         2019-09-03
3   AA  YES   JOINER         2019-01-05
4   AA  YES  CURRENT         2019-03-08

Upvotes: 2

Related Questions