Reputation: 876
I'm analyzing a dataset where I have medical bootcamp informations.
From start date and end date(day/month/year) of this bootcamps, I've created a function that lists all the months between startdate and enddate
import datetime
from dateutil.rrule import rrule, MONTHLY
def list_months_in_date(start_date: datetime, end_date : datetime) -> list :
strt_dt = datetime.datetime.strptime(start_date, "%d-%b-%y")
end_dt = datetime.datetime.strptime(end_date, "%d-%b-%y")
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
distinct_months = []
months = [date.strftime("%B") for date in dates if date.strftime("%B") not in distinct_months]
distinct_months = list(set(months))
return distinct_months
and I have my function to obtain season from a range of dates
def list_months_to_season(distinct_months : list) -> list:
season = []
autumn = ["September","October","November"]
winter = ["December","January","February"]
summer = ["June","July","August"]
spring = ["March","April","May"]
for month in distinct_months:
if month in autumn :
season.append("autuumn")
if month in winter :
season.append("winter")
if month in spring :
season.append("spring")
if month in summer :
season.append("summer")
return season
What I need is to obtain the seasons between start date and end date (Summer, Spring,Winter, Autumn) in order to have
|id_medicalcamp|start_date|end_date |seasons |
| 0010 |01/06/2019|01/09/2020|summer,autumn|
I'm running following code
df_med_camps['season_label'] = df_med_camps.apply(lambda data : list_months_in_date(data["Camp_Start_Date"],data["Camp_End_Date"]))
that gives me error KeyError: 'Camp_Start_Date'
Upvotes: 1
Views: 1033
Reputation: 22503
First cast your dates to Datetime
, then create dict
of season and map
by month:
df["start_date"] = pd.to_datetime(df["start_date"], format="%d/%m/%Y")
df["end_date"] = pd.to_datetime(df["end_date"], format="%d/%m/%Y")
s = {6:"Summer", 7:"Summer", 8:"Summer", 9:"Autumn", 10: "Autumn"} #...
df["label"] = df.filter(like="date").apply(lambda d: d.dt.month.map(s)).agg(", ".join, axis=1)
print (df)
id_medicalcamp start_date end_date label
0 10 2019-06-01 2020-09-01 Summer, Autumn
Upvotes: 2