Reputation: 506
I have some data in table format, where the rows are years, and the columns are months. I would like to convert this to a time series format in pandas, and then group the data by seasonal mean (I would specifically like to define the winter season as November-March, and discard everything else). Here's the code I have that reads the data into the table format, so you can see what's going on:
import pandas as pd
headers = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',
delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)
df = df.drop(df.tail(7).index)
df
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1871 -0.25 -0.58 -0.43 -0.50 -0.70 -0.53 -0.60 -0.33 -0.24 -0.33 -0.31 -0.58
1872 -0.72 -0.62 -0.50 -0.77 -0.62 -0.52 -0.32 -0.85 -1.02 -0.94 -0.79 -0.88
1873 -0.78 -1.01 -1.31 -0.67 -0.53 -0.48 -0.58 -0.39 -0.34 -0.78 -0.77 -0.70
1874 -0.93 -1.06 -1.40 -0.94 -0.86 -0.72 -1.00 -1.05 -1.13 -1.25 -1.33 -1.14
1875 -0.71 -0.37 -0.59 -0.87 -1.09 -0.76 -0.85 -0.81 -0.91 -0.83 -0.64 -0.75
1876 -0.95 -1.20 -1.13 -1.18 -1.08 -0.43 -0.34 -0.16 -0.02 0.11 0.15 0.23
1877 0.35 0.46 0.52 0.50 0.76 0.98 1.42 1.54 1.75 1.95 2.08 2.49
1878 2.41 2.43 1.31 0.92 0.82 0.92 0.25 -0.11 -0.32 -0.53 -0.70 -0.75
1879 -0.55 -0.18 -0.24 -0.37 -0.83 -0.67 -0.77 -0.69 -0.83 -0.93 -1.14 -1.02
1880 -1.00 -0.73 -0.62 -0.57 -0.71 -0.61 -0.53 -0.24 -0.03 0.17 0.24 0.18
1881 0.29 0.23 0.32 0.41 0.16 0.23 -0.26 -0.17 -0.33 -0.43 -0.59 -0.37
1882 -0.45 -0.55 -0.53 0.10 0.01 -0.40 -0.57 -0.33 -0.51 -0.65 -0.92 -0.75
The result I'm going for is a time series of the mean of the November-March values of each winter season (for example, mean of Nov 2018, Dec 2018, Jan 2019, Feb 2019 & March 2019). I'm guessing the process involves first sorting this data into a timeseries (which I am unsure about how to do) and then grouping by season (Nov-March), and dropping the other months.
Upvotes: 2
Views: 2839
Reputation: 62403
pandas.DataFrame.rolling
to specify the windowimport pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# reset index
df.reset_index(inplace=True)
# rename the column to year
df.rename(columns={'index': 'year'}, inplace=True)
# melt df into a long format
df2 = df.melt(id_vars='year', var_name='month')
# create a datatime column and set it as index
df2['date'] = pd.to_datetime(df2['year'].astype(str) + '-' + df2['month'])
df2.set_index('date', inplace=True)
# define a winter dataframe
winter = df2[df2['month'].isin(['Nov', 'Dec', 'Jan', 'Feb', 'Mar'])].copy()
# sort the index
winter.sort_index(inplace=True)
year month value
date
1871-01-01 1871 Jan -0.25
1871-02-01 1871 Feb -0.58
1871-03-01 1871 Mar -0.43
1871-11-01 1871 Nov -0.31
1871-12-01 1871 Dec -0.58
1872-01-01 1872 Jan -0.72
1872-02-01 1872 Feb -0.62
1872-03-01 1872 Mar -0.50
1872-11-01 1872 Nov -0.79
1872-12-01 1872 Dec -0.88
1873-01-01 1873 Jan -0.78
1873-02-01 1873 Feb -1.01
1873-03-01 1873 Mar -1.31
1873-11-01 1873 Nov -0.77
1873-12-01 1873 Dec -0.70
1874-01-01 1874 Jan -0.93
1874-02-01 1874 Feb -1.06
1874-03-01 1874 Mar -1.40
1874-11-01 1874 Nov -1.33
1874-12-01 1874 Dec -1.14
1875-01-01 1875 Jan -0.71
1875-02-01 1875 Feb -0.37
1875-03-01 1875 Mar -0.59
1875-11-01 1875 Nov -0.64
1875-12-01 1875 Dec -0.75
1876-01-01 1876 Jan -0.95
1876-02-01 1876 Feb -1.20
1876-03-01 1876 Mar -1.13
1876-11-01 1876 Nov 0.15
1876-12-01 1876 Dec 0.23
1877-01-01 1877 Jan 0.35
1877-02-01 1877 Feb 0.46
1877-03-01 1877 Mar 0.52
1877-11-01 1877 Nov 2.08
1877-12-01 1877 Dec 2.49
1878-01-01 1878 Jan 2.41
1878-02-01 1878 Feb 2.43
1878-03-01 1878 Mar 1.31
1878-11-01 1878 Nov -0.70
1878-12-01 1878 Dec -0.75
1879-01-01 1879 Jan -0.55
1879-02-01 1879 Feb -0.18
1879-03-01 1879 Mar -0.24
1879-11-01 1879 Nov -1.14
1879-12-01 1879 Dec -1.02
1880-01-01 1880 Jan -1.00
1880-02-01 1880 Feb -0.73
1880-03-01 1880 Mar -0.62
1880-11-01 1880 Nov 0.24
1880-12-01 1880 Dec 0.18
1881-01-01 1881 Jan 0.29
1881-02-01 1881 Feb 0.23
1881-03-01 1881 Mar 0.32
1881-11-01 1881 Nov -0.59
1881-12-01 1881 Dec -0.37
1882-01-01 1882 Jan -0.45
1882-02-01 1882 Feb -0.55
1882-03-01 1882 Mar -0.53
1882-11-01 1882 Nov -0.92
1882-12-01 1882 Dec -0.75
# drop incomplete season at the beginning and end of the winter dataframe
winter.drop([pd.Timestamp('1871-01-01'),
pd.Timestamp('1871-02-01'),
pd.Timestamp('1871-03-01'),
pd.Timestamp('1882-11-01'),
pd.Timestamp('1882-12-01')], inplace=True)
# compute mean of season, every 5 rows
yearly_avg = winter.groupby(winter.reset_index().index // 5)['value'].mean()
# create label for season start data
winter_year = pd.date_range(start='1871-11-01', periods=11, freq='12MS')
# create winter_avg dataframe
winter_avg = pd.DataFrame({'winter_start': winter_year, 'season_avg': yearly_avg})
winter_avg.set_index('winter_start', inplace=True)
season_avg
winter_start
1871-11-01 -0.546
1872-11-01 -0.954
1873-11-01 -0.972
1874-11-01 -0.828
1875-11-01 -0.934
1876-11-01 0.342
1877-11-01 2.144
1878-11-01 -0.484
1879-11-01 -0.902
1880-11-01 0.252
1881-11-01 -0.498
# prevent future warning
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
sns.lineplot(winter.index, winter['value'], label='all values')
sns.scatterplot(winter_avg.index, winter_avg['season_avg'], label='season avg', color='orange')
plt.ylabel('value')
plt.xlabel('year')
plt.show()
Upvotes: 1
Reputation: 3097
You can achieve it by following method too.
months = ['Jan' , 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
seasons = 'Nov-Mar'.split('-')
start = months.index(seasons[0])
end = months.index(seasons[1])
season_months1 = []
season_months2 = []
if end >= start:
season_months1 = months[start:end+1]
df['result'] = df[season_months1].mean(axis=1)
if end < start:
season_months1 = months[start:12]
season_months2 = months[0:end+1]
df2 = df[season_months2 + ['Year']]
df2['Year'] = df2['Year'] + 1
df3 = df2.merge(df[season_months1 + ['Year']], how='left')
df3['result'] = df3.mean(axis=1)
df = df.merge(df3[['Year', 'result']], how='left', on='Year')
df
It will give the result as follows
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec result
0 1871 -0.25 -0.58 -0.43 -0.50 -0.70 -0.53 -0.60 -0.33 -0.24 -0.33 -0.31 -0.58 NaN
1 1872 -0.72 -0.62 -0.50 -0.77 -0.62 -0.52 -0.32 -0.85 -1.02 -0.94 -0.79 -0.88 311.511667
2 1873 -0.78 -1.01 -1.31 -0.67 -0.53 -0.48 -0.58 -0.39 -0.34 -0.78 -0.77 -0.70 311.615000
3 1874 -0.93 -1.06 -1.40 -0.94 -0.86 -0.72 -1.00 -1.05 -1.13 -1.25 -1.33 -1.14 311.405000
4 1875 -0.71 -0.37 -0.59 -0.87 -1.09 -0.76 -0.85 -0.81 -0.91 -0.83 -0.64 -0.75 311.703333
5 1876 -0.95 -1.20 -1.13 -1.18 -1.08 -0.43 -0.34 -0.16 -0.02 0.11 0.15 0.23 312.451667
6 1877 0.35 0.46 0.52 0.50 0.76 0.98 1.42 1.54 1.75 1.95 2.08 2.49 313.048333
7 1878 2.41 2.43 1.31 0.92 0.82 0.92 0.25 -0.11 -0.32 -0.53 -0.70 -0.75 312.980000
8 1879 -0.55 -0.18 -0.24 -0.37 -0.83 -0.67 -0.77 -0.69 -0.83 -0.93 -1.14 -1.02 313.831667
9 1880 -1.00 -0.73 -0.62 -0.57 -0.71 -0.61 -0.53 -0.24 -0.03 0.17 0.24 0.18 313.241667
10 1881 0.29 0.23 0.32 0.41 0.16 0.23 -0.26 -0.17 -0.33 -0.43 -0.59 -0.37 312.948333
11 1882 -0.45 -0.55 -0.53 0.10 0.01 -0.40 -0.57 -0.33 -0.51 -0.65 -0.92 -0.75 313.528333
Upvotes: 0