Reputation: 5
I have created a custom function that describes seasonality and want to add a new column to a dataframe by applying that function to a series of datetime objects in a pandas dataframe. I'm attempting to create a list that contains the values of the date_season function applied to the dates in the dataframe.
All the variables in the date_season function below are of the type datetime.date, except for 'dif' which is a datetime.timedelta.
Here is the function:
import datetime as dt
import pandas as pd
def date_season(date):
year = date.year
min_season = dt.date(year,1,1)
max_season = dt.date(year,6,30)
dif = abs(max_season - date)
dif_days = dif.days
x = (((max_season - min_season).days) - dif.days * 2) / (max_season - min_season).days
seasonality = np.sin(x * (np.pi) / 2)
return(seasonality)
And here is how the pandas dataframe is created:
start = dt.date(2017,1,1)
end = dt.date(2019,12,31)
df = pd.DataFrame({'Date': pd.date_range(start, end, freq="D")})
Attempting to create a new list with the seasonality parameter:
z = []
for index, row in df.iterrows():
z.append(date_season(row.Date))
This returns the error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-105-63e9cb35ed55> in <module>()
1 z = []
2 for index, row in df.iterrows():
----> 3 z.append(date_season(row.Date))
<ipython-input-71-5e2b35e24e38> in date_season(date)
3 min_season = dt.date(year,1,1)
4 max_season = dt.date(year,6,30)
----> 5 dif = abs(max_season - date)
6 dif_days = dif.days
7 x = (((max_season - min_season).days) - dif.days * 2) / (max_season - min_season).days
pandas\_libs\tslibs\timestamps.pyx in
pandas._libs.tslibs.timestamps._Timestamp.__sub__()
TypeError: descriptor '__sub__' requires a 'datetime.datetime' object but received a 'datetime.date'
Attempting:
new_df = df.apply(lambda x: date_season(x))
returns
AttributeError: ("'Series' object has no attribute 'year'", 'occurred at index Date')
Not sure why it requires a datetime.datetime object, because the function works with single inputs in the datetime.date format. Is there a simpler way to iterate through the dates and create a new column with the results of this function?
Upvotes: 0
Views: 985
Reputation: 1120
You need to define the min_season and max_season as pandas datetime objects instead of the built-in python datetime class. It's confusing but they are not completely interchangeable.
def date_season(date):
year = date.year
#use pandas.datetime
min_season = pd.datetime(year,1,1)
max_season = pd.datetime(year,6,30)
dif = abs(max_season - date)
dif_days = dif.days
x = (((max_season - min_season).days) - dif.days * 2) / (max_season - min_season).days
seasonality = np.sin(x * (np.pi) / 2)
return(seasonality)
Now you can use either applymap for your whole dataframe or you can use apply on a single column.
new_df = df.applymap(date_season)
or
df['Date'].apply(date_season)
Upvotes: 1