Raven
Raven

Reputation: 859

Conditionally Add Day to Date

I'm sure I'm missing something simple but I'm trying to conditionally add 1 day to my date. I have a dataframe DF, with variables ID, FIRST, LAST, FORMATTEDDATE. When strings in the LAST column contain an 'a', I would like to add 1 day to the FORMATTEDDATE and if it does not contain an "a", i would like to keep the FORMATTEDDATE as is.

The dtypes for the variables of interest are:

Last                     object
FormattedDate    datetime64[ns]

CURRENT DATASET

ID  LAST  FormattedDate  
   1   7a    2020-01-01
   2   7p    2020-01-01
   3   2a    2020-01-01

DESIRED DATASET

ID  LAST       DateUpdate
   1   7a    2020-01-02
   2   7p    2020-01-01
   3   2a    2020-01-02

To get my feet wet in adding dates, I wrote the following code which works:

DF["DateUpdate"] = DF.FormattedDate+ timedelta(days=1) #add your timedelta

However, and like I explained above, I would like to apply this conditionally and to do so, wrote the following code:

DF["DateUpdate"]=np.where(DF["Last"].str.contains("a"),FormattedDate + timedelta(days=1),FormattedDate)

When running this code, I get an error:

NameError                                 Traceback (most recent call last)
<ipython-input-169-57f35bb7a1b2> in <module>
----> 1 DF["DateUpdate"]=np.where(DF["Last"].str.contains("a"),FormattedDate + timedelta(days=1),FormattedDate)

NameError: name 'FormattedDate' is not defined

I'm reading about this error here but not quite sure why its popping up because the FormattedDate variable is housed within my dataframe. https://www2.cs.arizona.edu/people/mccann/errors-python#Four

Upvotes: 1

Views: 403

Answers (2)

LTheriault
LTheriault

Reputation: 1230

So it looks like you don't actually have a variable called "FormattedDate" in your DataFrame. You have a column labeled with that name. Despite the fact that you can access columns in your DataFrame like a variable (e.g. df.column), it doesn't necessarily behave like a variable and can't be referenced by name alone unless you're manually assigning that column as a Series to a variable with the same name. This is part of the reason why I prefer to access columns with df["column"].

A standard way to handle this in pandas would be with the apply() method.

DF["DateUpdate"] = DF.apply(lambda x: x['FormattedDate'] + timedelta(days=1) 
                            if 'a' in x['Last'] else x['FormattedDate'],
                            axis=1)

Upvotes: 2

BENY
BENY

Reputation: 323226

Fix your code

DF["DateUpdate"]=np.where(DF["Last"].str.contains("a"),DF.FormattedDate + timedelta(days=1),DF.FormattedDate)

Upvotes: 5

Related Questions