Reputation: 859
I'm sure I'm missing something simple but I'm trying to conditionally add 1 day to my date. I have a dataframe DF, with variables ID, FIRST, LAST, FORMATTEDDATE
. When strings in the LAST column contain an 'a', I would like to add 1 day to the FORMATTEDDATE and if it does not contain an "a", i would like to keep the FORMATTEDDATE as is.
The dtypes for the variables of interest are:
Last object
FormattedDate datetime64[ns]
CURRENT DATASET
ID LAST FormattedDate
1 7a 2020-01-01
2 7p 2020-01-01
3 2a 2020-01-01
DESIRED DATASET
ID LAST DateUpdate
1 7a 2020-01-02
2 7p 2020-01-01
3 2a 2020-01-02
To get my feet wet in adding dates, I wrote the following code which works:
DF["DateUpdate"] = DF.FormattedDate+ timedelta(days=1) #add your timedelta
However, and like I explained above, I would like to apply this conditionally and to do so, wrote the following code:
DF["DateUpdate"]=np.where(DF["Last"].str.contains("a"),FormattedDate + timedelta(days=1),FormattedDate)
When running this code, I get an error:
NameError Traceback (most recent call last)
<ipython-input-169-57f35bb7a1b2> in <module>
----> 1 DF["DateUpdate"]=np.where(DF["Last"].str.contains("a"),FormattedDate + timedelta(days=1),FormattedDate)
NameError: name 'FormattedDate' is not defined
I'm reading about this error here but not quite sure why its popping up because the FormattedDate variable is housed within my dataframe. https://www2.cs.arizona.edu/people/mccann/errors-python#Four
Upvotes: 1
Views: 403
Reputation: 1230
So it looks like you don't actually have a variable called "FormattedDate" in your DataFrame. You have a column labeled with that name. Despite the fact that you can access columns in your DataFrame like a variable (e.g. df.column
), it doesn't necessarily behave like a variable and can't be referenced by name alone unless you're manually assigning that column as a Series to a variable with the same name. This is part of the reason why I prefer to access columns with df["column"]
.
A standard way to handle this in pandas would be with the apply()
method.
DF["DateUpdate"] = DF.apply(lambda x: x['FormattedDate'] + timedelta(days=1)
if 'a' in x['Last'] else x['FormattedDate'],
axis=1)
Upvotes: 2
Reputation: 323226
Fix your code
DF["DateUpdate"]=np.where(DF["Last"].str.contains("a"),DF.FormattedDate + timedelta(days=1),DF.FormattedDate)
Upvotes: 5