Reputation: 1026
I want to convert a date time series to season, for example for months 3, 4, 5 I want to replace them with 2 (spring); for months 6, 7, 8 I want to replace them with 3 (summer) etc.
So, I have this series
id
1 2011-08-20
2 2011-08-23
3 2011-08-27
4 2011-09-01
5 2011-09-05
6 2011-09-06
7 2011-09-08
8 2011-09-09
Name: timestamp, dtype: datetime64[ns]
and this is the code I have been trying to use, but to no avail.
# Get seasons
spring = range(3, 5)
summer = range(6, 8)
fall = range(9, 11)
# winter = everything else
month = temp2.dt.month
season=[]
for _ in range(len(month)):
if any(x == spring for x in month):
season.append(2) # spring
elif any(x == summer for x in month):
season.append(3) # summer
elif any(x == fall for x in month):
season.append(4) # fall
else:
season.append(1) # winter
and
for _ in range(len(month)):
if month[_] == 3 or month[_] == 4 or month[_] == 5:
season.append(2) # spring
elif month[_] == 6 or month[_] == 7 or month[_] == 8:
season.append(3) # summer
elif month[_] == 9 or month[_] == 10 or month[_] == 11:
season.append(4) # fall
else:
season.append(1) # winter
Neither solution works, specifically in the first implementation I receive an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
While in the second is a large list with errors. Any ideas please? Thanks
Upvotes: 24
Views: 37631
Reputation: 1291
A bit more generic method which works for ndarray, list or pandas series.
df_date = pd.DataFrame(pd.date_range('2024-01-01', '2025-01-01', freq='1D', normalize=True), columns=['date'])
day_shift = 14
df_date['season_int'] = get_season(df_date['date'], day_shift=day_shift)
df_date['season'] = get_season(df_date['date'], day_shift=day_shift, season_names=True)
df_date[df_date['season_int'].diff().abs() > 0]
# date season_int season
# 2024-03-15 1 spring
# 2024-06-15 2 summer
# 2024-09-15 3 fall
# 2024-12-15 0 winter
# or a list
get_season(['2024-01-01 09:04:00', '2024-03-01 09:04:00'], season_names= True)
# array(['winter', 'spring'], dtype='<U6')
# and with mapped season names
get_season(['2024-01-01', '2024-03-01', '2024-06-01', '2024-09-01'], season_names= {0: 'cold', 1: 'flowering', 2: 'hot', 3: 'harvest'} )
# rray(['cold', 'flowering', 'hot', 'harvest'], dtype='<U9')
import numpy as np
import pandas as pd
def get_season(dates, month_shift = 0, day_shift = 0, season_names = None):
"""
Get the season of a given date.
Parameters
----------
dates : array-like of datetime64
The dates for which to calculate the season.
month_shift : int, default 0
Number of months to shift the calculation.
- for winter [Dec, Jan, Feb] -> month_shift=0 (default)
- for winter [Jan, Feb, Mar] -> month_shift=1
day_shift : int, default 0
Number of days to shift the calculation. The season starts:
- at the first day of the month -> day_shift=0 (default)
- at the second day of the month -> day_shift=1
season_names : dict or None or bool, default None
To optinaly map the seasons
None: no mapping
True: default mapping with dict {0: 'winter', 1: 'spring', 2: 'summer', 3: 'fall'}
dict: any other mapping. Keys have to be in range(4).
Returns
-------
array of int or str
The season of each input date.
Examples
--------
day_shift = 14
df_date = pd.DataFrame(pd.date_range('2024-01-01', '2025-01-01', freq='1D', normalize=True), columns=['date'])
df_date['season_int'] = get_season(df_date['date'], day_shift=day_shift)
df_date['season'] = get_season(df_date['date'], day_shift=day_shift, season_names=True)
df_date[df_date['season_int'].diff().abs() > 0]
# date season_int season
# 2024-03-15 1 spring
# 2024-06-15 2 summer
# 2024-09-15 3 fall
# 2024-12-15 0 winter
"""
if isinstance(dates, pd.Series):
dates = dates.values
if isinstance(dates, list):
dates = np.array(dates).astype('datetime64')
dates = dates + np.timedelta64(-day_shift, 'D')
season = (dates.astype('datetime64[M]').astype(int) - month_shift) % 12 / 3
season = np.round(season, decimals=0).astype(int) % 4
# season = season.astype(int)
if season_names is not None:
if season_names is True:
season_names = {0: 'winter', 1: 'spring', 2: 'summer', 3: 'fall'}
return np.vectorize(season_names.get)(season)
return season
Upvotes: 0
Reputation: 1
Here is my solution (not the best solution for leap years) if you want to convert date to season if you take in mind month and day in the month. I took arbitrary non-leap year:
import pandas as pd
df = pd.DataFrame({'Date': pd.date_range('2022-01-01', '2023-01-01', periods=12)})
winter_start = pd.to_datetime("2022-12-21", format = "%Y-%m-%d").dayofyear
spring_start = pd.to_datetime("2022-3-21", format = "%Y-%m-%d").dayofyear
summer_start = pd.to_datetime("2022-6-21", format = "%Y-%m-%d").dayofyear
autumn_start = pd.to_datetime("2022-9-23", format = "%Y-%m-%d").dayofyear
for index, date in df["Date"].items():
if (date.dayofyear >= winter_start) or (date.dayofyear < spring_start):
df.at[index, "Season"] = "Winter"
elif (date.dayofyear >= spring_start) and (date.dayofyear < summer_start):
df.at[index, "Season"] = "Spring"
elif (date.dayofyear >= summer_start) and (date.dayofyear < autumn_start):
df.at[index, "Season"] = "Summer"
else:
df.at[index, "Season"] = "Autumn"
Out:
Date Season
0 2022-01-01 00:00:00.000000000 Winter
1 2022-02-03 04:21:49.090909091 Winter
2 2022-03-08 08:43:38.181818182 Winter
3 2022-04-10 13:05:27.272727273 Spring
4 2022-05-13 17:27:16.363636364 Spring
5 2022-06-15 21:49:05.454545456 Spring
6 2022-07-19 02:10:54.545454546 Summer
7 2022-08-21 06:32:43.636363636 Summer
8 2022-09-23 10:54:32.727272728 Autumn
9 2022-10-26 15:16:21.818181820 Autumn
10 2022-11-28 19:38:10.909090912 Autumn
11 2023-01-01 00:00:00.000000000 Winter
Upvotes: 0
Reputation: 406
import pandas as pd
import datetime as dt
df = pd.DataFrame({'date': pd.date_range('2000-01-01', '2001-01-01', periods=12)})
seasons = {(1, 12, 2): 1, (3, 4, 5): 2, (6, 7, 8): 3, (9, 10, 11): 4}
df['m'] = df.date.dt.month
def season(ser):
for k in seasons.keys():
if ser in k:
return seasons[k]
df['s'] = df.m.apply(seasons)
Out[25]:
date m s
0 2000-01-01 00:00:00.000000000 1 1
1 2000-02-03 06:32:43.636363636 2 1
2 2000-03-07 13:05:27.272727273 3 2
3 2000-04-09 19:38:10.909090910 4 2
4 2000-05-13 02:10:54.545454546 5 2
5 2000-06-15 08:43:38.181818182 6 3
6 2000-07-18 15:16:21.818181820 7 3
7 2000-08-20 21:49:05.454545456 8 3
8 2000-09-23 04:21:49.090909092 9 4
9 2000-10-26 10:54:32.727272728 10 4
10 2000-11-28 17:27:16.363636364 11 4
11 2001-01-01 00:00:00.000000000 1 1
Upvotes: 2
Reputation: 59
I think a more precise solution may be useful. If we have a month (1, ..., 12), we can convert it to season decreasing one and dividing by 3,
df = pd.Series(["2011-06-07",
"2011-08-23",
"2011-08-27",
"2011-09-01",
"2011-09-05",
"2011-09-06",
"2011-09-08",
"2011-12-25"])
df = pd.to_datetime(df)
season = (df.dt.month - 1) // 3
Therefore we will be mapping 1,2,3 to 0 (winter), 4,5,6 to 1 (spring), 7,8,9 to 2 (summer), and 10,11,12 to 3 (fall). However, we know the months 3,6,9, and 12 divide two seasons each. I propose the following approach:
If the month is 3 and the day is greater or equal 20, the season is spring, and we need to sum 1. If the month is 6 and the day is greater or equal 21, the season is summer, and we need to sum 1. If the month is 9 and the day is greater or equal 23, the season is fall, and we need to sum 1. If the month is 3 and the day is greater or equal 20, the season is winter, and we need to decrease 3 (or sum +1 in modulus 4). Then we have
season += (df.dt.month == 3)&(df.dt.day>=20)
season += (df.dt.month == 6)&(df.dt.day>=21)
season += (df.dt.month == 9)&(df.dt.day>=23)
season -= 3*((df.dt.month == 12)&(df.dt.day>=21)).astype(int)
The solution for this series will be [1,2,2,2,2,2,2,0].
Upvotes: 4
Reputation: 30268
You can use a simple mathematical formula to compress a month to a season, e.g.:
>>> [month%12 // 3 + 1 for month in range(1, 13)]
[1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 1]
So for your use-case using vector operations (credit @DSM):
>>> temp2.dt.month%12 // 3 + 1
1 3
2 3
3 3
4 4
5 4
6 4
7 4
8 4
Name: id, dtype: int64
Upvotes: 51
Reputation: 14689
It's, also, possible to use dictionary mapping.
Create a dictionary that maps a month to a season:
In [27]: seasons = [1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 1]
In [28]: month_to_season = dict(zip(range(1,13), seasons))
In [29]: month_to_season
Out[29]: {1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4, 12: 1}
Use it to convert the months to seasons
In [30]: df.id.dt.month.map(month_to_season)
Out[30]:
1 3
2 3
3 3
4 4
5 4
6 4
7 4
8 4
Name: id, dtype: int64
Performance: This is fairly fast
In [35]: %timeit df.id.dt.month.map(month_to_season)
1000 loops, best of 3: 422 µs per loop
Upvotes: 8
Reputation: 39
I think this would work.
while True:
date=int(input("Date?"))
season=""
if date<4:
season=1
elif date<7:
season=2
elif date<10:
season=3
elif date<13:
season=4
else:
print("This would not work.")
print(season)
Upvotes: 2