jml
jml

Reputation: 137

python/pandas collapsing columns as datetime quarters

I have a dataframe with a bunch of columns labelled in 'YYYY-MM' format, along with several other columns. I need to collapse the date columns into calendar quarters and take the mean; I was able to do it manually, but there are a few hundred date columns in my real data and I'd like to not have to map every single one of them by hand. I'm generating the initial df from a CSV; I didn't see anything in read_csv that seemed like it would help, but if there's anything I can leverage there that would be great. I found dataframe.dt.to_period("Q") that will convert a datetime object to quarter, but I'm not quite sure how to apply that here, if I can at all.

Here's a sample df (code below):

   foo  bar  2016-04  2016-05  2016-06  2016-07  2016-08
0    6    5        3        3        5        8        1
1    9    3        6        9        9        7        8
2    8    5        8        1        9        9        4
3    5    8        1        2        3        5        6
4    4    5        1        2        7        2        6

This code will do what I'm looking for, but I had to generate mapping by hand:

mapping = {'2016-04':'2016q2', '2016-05':'2016q2', '2016-06':'2016q2', '2016-07':'2016q3', '2016-08':'2016q3'}
df = df.set_index(['foo', 'bar']).groupby(mapping, axis=1).mean().reset_index()

New df:

    foo bar 2016q2  2016q3
0   6   5   3.666667    4.5
1   9   3   8.000000    7.5
2   8   5   6.000000    6.5
3   5   8   2.000000    5.5
4   4   5   3.333333    4.0

Code to generate the initial df:

df = pd.DataFrame(np.random.randint(1, 11, size=(5, 7)), columns=('foo', 'bar', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08'))                                                        '2016-07', '2016-08'))

Upvotes: 2

Views: 369

Answers (3)

xpertdev
xpertdev

Reputation: 1433

data = [[2,2,2,3,3,3],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5]]
df = pd.DataFrame(data, columns = ['A','1996-04','1996-05','2000-07','2000-08','2010-10'])
# separate year columns and other columns
# separate year columns
df3 = df.iloc[:, 1:]
# separate other columns
df2 = df.iloc[:,0]
#apply groupby using period index
df3=df3.groupby(pd.PeriodIndex(df3.columns, freq='Q'), axis=1).mean()
final_df = pd.concat([df3,df2], axis=1)
print(final_df)

output is attached in image:

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30991

The solution is quite short:

Start from copying "monthly" columns to another DataFrame and converting column names to PeriodIndex:

df2 = df.iloc[:, 2:]
df2.columns = pd.PeriodIndex(df2.columns, freq='M')

Then, to get the result, resample columns by quarter, compute the mean (for each quarter) and join with 2 "initial" columns:

df.iloc[:, :2].join(df2.resample('Q', axis=1).agg('mean'))

Upvotes: 0

piRSquared
piRSquared

Reputation: 294318

Use a callable that gets applied to the index values. Use axis=1 to apply it to the column values instead.

(df.set_index(['foo', 'bar'])
   .groupby(lambda x: pd.Period(x, 'Q'), axis=1)
   .mean().reset_index())

   foo  bar    2016Q2  2016Q3
0    6    5  3.666667     4.5
1    9    3  8.000000     7.5
2    8    5  6.000000     6.5
3    5    8  2.000000     5.5
4    4    5  3.333333     4.0

Upvotes: 4

Related Questions