Karl Schneider
Karl Schneider

Reputation: 506

Convert monthly data table to seasonal time series using pandas

I have some data in table format, where the rows are years, and the columns are months. I would like to convert this to a time series format in pandas, and then group the data by seasonal mean (I would specifically like to define the winter season as November-March, and discard everything else). Here's the code I have that reads the data into the table format, so you can see what's going on:

import pandas as pd
headers = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',
                   delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)
df = df.drop(df.tail(7).index)
df
         Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec
1871    -0.25   -0.58   -0.43   -0.50   -0.70   -0.53   -0.60   -0.33   -0.24   -0.33   -0.31   -0.58
1872    -0.72   -0.62   -0.50   -0.77   -0.62   -0.52   -0.32   -0.85   -1.02   -0.94   -0.79   -0.88
1873    -0.78   -1.01   -1.31   -0.67   -0.53   -0.48   -0.58   -0.39   -0.34   -0.78   -0.77   -0.70
1874    -0.93   -1.06   -1.40   -0.94   -0.86   -0.72   -1.00   -1.05   -1.13   -1.25   -1.33   -1.14
1875    -0.71   -0.37   -0.59   -0.87   -1.09   -0.76   -0.85   -0.81   -0.91   -0.83   -0.64   -0.75
1876    -0.95   -1.20   -1.13   -1.18   -1.08   -0.43   -0.34   -0.16   -0.02   0.11    0.15    0.23
1877    0.35    0.46    0.52    0.50    0.76    0.98    1.42    1.54    1.75    1.95    2.08    2.49
1878    2.41    2.43    1.31    0.92    0.82    0.92    0.25    -0.11   -0.32   -0.53   -0.70   -0.75
1879    -0.55   -0.18   -0.24   -0.37   -0.83   -0.67   -0.77   -0.69   -0.83   -0.93   -1.14   -1.02
1880    -1.00   -0.73   -0.62   -0.57   -0.71   -0.61   -0.53   -0.24   -0.03   0.17    0.24    0.18
1881    0.29    0.23    0.32    0.41    0.16    0.23    -0.26   -0.17   -0.33   -0.43   -0.59   -0.37
1882    -0.45   -0.55   -0.53   0.10    0.01    -0.40   -0.57   -0.33   -0.51   -0.65   -0.92   -0.75

The result I'm going for is a time series of the mean of the November-March values of each winter season (for example, mean of Nov 2018, Dec 2018, Jan 2019, Feb 2019 & March 2019). I'm guessing the process involves first sorting this data into a timeseries (which I am unsure about how to do) and then grouping by season (Nov-March), and dropping the other months.

Upvotes: 2

Views: 2839

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62403

Given the data as shown

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# reset index
df.reset_index(inplace=True)

# rename the column to year
df.rename(columns={'index': 'year'}, inplace=True)

# melt df into a long format
df2 = df.melt(id_vars='year', var_name='month')

# create a datatime column and set it as index
df2['date'] = pd.to_datetime(df2['year'].astype(str) + '-' + df2['month'])
df2.set_index('date', inplace=True)

# define a winter dataframe
winter = df2[df2['month'].isin(['Nov', 'Dec', 'Jan', 'Feb', 'Mar'])].copy()

# sort the index
winter.sort_index(inplace=True)

winter dataframe

            year month  value
date                         
1871-01-01  1871   Jan  -0.25
1871-02-01  1871   Feb  -0.58
1871-03-01  1871   Mar  -0.43
1871-11-01  1871   Nov  -0.31
1871-12-01  1871   Dec  -0.58
1872-01-01  1872   Jan  -0.72
1872-02-01  1872   Feb  -0.62
1872-03-01  1872   Mar  -0.50
1872-11-01  1872   Nov  -0.79
1872-12-01  1872   Dec  -0.88
1873-01-01  1873   Jan  -0.78
1873-02-01  1873   Feb  -1.01
1873-03-01  1873   Mar  -1.31
1873-11-01  1873   Nov  -0.77
1873-12-01  1873   Dec  -0.70
1874-01-01  1874   Jan  -0.93
1874-02-01  1874   Feb  -1.06
1874-03-01  1874   Mar  -1.40
1874-11-01  1874   Nov  -1.33
1874-12-01  1874   Dec  -1.14
1875-01-01  1875   Jan  -0.71
1875-02-01  1875   Feb  -0.37
1875-03-01  1875   Mar  -0.59
1875-11-01  1875   Nov  -0.64
1875-12-01  1875   Dec  -0.75
1876-01-01  1876   Jan  -0.95
1876-02-01  1876   Feb  -1.20
1876-03-01  1876   Mar  -1.13
1876-11-01  1876   Nov   0.15
1876-12-01  1876   Dec   0.23
1877-01-01  1877   Jan   0.35
1877-02-01  1877   Feb   0.46
1877-03-01  1877   Mar   0.52
1877-11-01  1877   Nov   2.08
1877-12-01  1877   Dec   2.49
1878-01-01  1878   Jan   2.41
1878-02-01  1878   Feb   2.43
1878-03-01  1878   Mar   1.31
1878-11-01  1878   Nov  -0.70
1878-12-01  1878   Dec  -0.75
1879-01-01  1879   Jan  -0.55
1879-02-01  1879   Feb  -0.18
1879-03-01  1879   Mar  -0.24
1879-11-01  1879   Nov  -1.14
1879-12-01  1879   Dec  -1.02
1880-01-01  1880   Jan  -1.00
1880-02-01  1880   Feb  -0.73
1880-03-01  1880   Mar  -0.62
1880-11-01  1880   Nov   0.24
1880-12-01  1880   Dec   0.18
1881-01-01  1881   Jan   0.29
1881-02-01  1881   Feb   0.23
1881-03-01  1881   Mar   0.32
1881-11-01  1881   Nov  -0.59
1881-12-01  1881   Dec  -0.37
1882-01-01  1882   Jan  -0.45
1882-02-01  1882   Feb  -0.55
1882-03-01  1882   Mar  -0.53
1882-11-01  1882   Nov  -0.92
1882-12-01  1882   Dec  -0.75
  • In this example, the winter of 1870 and 1882 are incomplete, comprised of 3 and 2 months, respectively.
    • Those 5 dates will be dropped
  • Compute the average for each season
# drop incomplete season at the beginning and end of the winter dataframe
winter.drop([pd.Timestamp('1871-01-01'),
             pd.Timestamp('1871-02-01'),
             pd.Timestamp('1871-03-01'),
             pd.Timestamp('1882-11-01'),
             pd.Timestamp('1882-12-01')], inplace=True)

# compute mean of season, every 5 rows
yearly_avg = winter.groupby(winter.reset_index().index // 5)['value'].mean()

# create label for season start data
winter_year = pd.date_range(start='1871-11-01', periods=11, freq='12MS')

# create winter_avg dataframe
winter_avg = pd.DataFrame({'winter_start': winter_year, 'season_avg': yearly_avg})
winter_avg.set_index('winter_start', inplace=True)

              season_avg
winter_start            
1871-11-01        -0.546
1872-11-01        -0.954
1873-11-01        -0.972
1874-11-01        -0.828
1875-11-01        -0.934
1876-11-01         0.342
1877-11-01         2.144
1878-11-01        -0.484
1879-11-01        -0.902
1880-11-01         0.252
1881-11-01        -0.498

Plot the data:

# prevent future warning
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

sns.lineplot(winter.index, winter['value'], label='all values')
sns.scatterplot(winter_avg.index, winter_avg['season_avg'], label='season avg', color='orange')
plt.ylabel('value')
plt.xlabel('year')
plt.show()

enter image description here

Upvotes: 1

Prince Francis
Prince Francis

Reputation: 3097

You can achieve it by following method too.

months = ['Jan' , 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
seasons = 'Nov-Mar'.split('-')
start = months.index(seasons[0])
end = months.index(seasons[1])
season_months1 = []
season_months2 = []
if end >= start:
    season_months1 = months[start:end+1]
    df['result'] = df[season_months1].mean(axis=1)
if end < start:
    season_months1 = months[start:12]
    season_months2 = months[0:end+1]
    df2 = df[season_months2 + ['Year']]
    df2['Year'] = df2['Year'] + 1
    df3 = df2.merge(df[season_months1 + ['Year']], how='left')
    df3['result'] = df3.mean(axis=1)
    df = df.merge(df3[['Year', 'result']], how='left', on='Year')
df

It will give the result as follows

    Year    Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec     result
0   1871    -0.25   -0.58   -0.43   -0.50   -0.70   -0.53   -0.60   -0.33   -0.24   -0.33   -0.31   -0.58   NaN
1   1872    -0.72   -0.62   -0.50   -0.77   -0.62   -0.52   -0.32   -0.85   -1.02   -0.94   -0.79   -0.88   311.511667
2   1873    -0.78   -1.01   -1.31   -0.67   -0.53   -0.48   -0.58   -0.39   -0.34   -0.78   -0.77   -0.70   311.615000
3   1874    -0.93   -1.06   -1.40   -0.94   -0.86   -0.72   -1.00   -1.05   -1.13   -1.25   -1.33   -1.14   311.405000
4   1875    -0.71   -0.37   -0.59   -0.87   -1.09   -0.76   -0.85   -0.81   -0.91   -0.83   -0.64   -0.75   311.703333
5   1876    -0.95   -1.20   -1.13   -1.18   -1.08   -0.43   -0.34   -0.16   -0.02   0.11    0.15    0.23    312.451667
6   1877    0.35    0.46    0.52    0.50    0.76    0.98    1.42    1.54    1.75    1.95    2.08    2.49    313.048333
7   1878    2.41    2.43    1.31    0.92    0.82    0.92    0.25    -0.11   -0.32   -0.53   -0.70   -0.75   312.980000
8   1879    -0.55   -0.18   -0.24   -0.37   -0.83   -0.67   -0.77   -0.69   -0.83   -0.93   -1.14   -1.02   313.831667
9   1880    -1.00   -0.73   -0.62   -0.57   -0.71   -0.61   -0.53   -0.24   -0.03   0.17    0.24    0.18    313.241667
10  1881    0.29    0.23    0.32    0.41    0.16    0.23    -0.26   -0.17   -0.33   -0.43   -0.59   -0.37   312.948333
11  1882    -0.45   -0.55   -0.53   0.10    0.01    -0.40   -0.57   -0.33   -0.51   -0.65   -0.92   -0.75   313.528333

Upvotes: 0

Related Questions