Reputation: 137
I have a dataframe with a bunch of columns labelled in 'YYYY-MM' format, along with several other columns. I need to collapse the date columns into calendar quarters and take the mean; I was able to do it manually, but there are a few hundred date columns in my real data and I'd like to not have to map every single one of them by hand. I'm generating the initial df from a CSV; I didn't see anything in read_csv
that seemed like it would help, but if there's anything I can leverage there that would be great. I found dataframe.dt.to_period("Q")
that will convert a datetime object to quarter, but I'm not quite sure how to apply that here, if I can at all.
Here's a sample df (code below):
foo bar 2016-04 2016-05 2016-06 2016-07 2016-08
0 6 5 3 3 5 8 1
1 9 3 6 9 9 7 8
2 8 5 8 1 9 9 4
3 5 8 1 2 3 5 6
4 4 5 1 2 7 2 6
This code will do what I'm looking for, but I had to generate mapping by hand:
mapping = {'2016-04':'2016q2', '2016-05':'2016q2', '2016-06':'2016q2', '2016-07':'2016q3', '2016-08':'2016q3'}
df = df.set_index(['foo', 'bar']).groupby(mapping, axis=1).mean().reset_index()
New df:
foo bar 2016q2 2016q3
0 6 5 3.666667 4.5
1 9 3 8.000000 7.5
2 8 5 6.000000 6.5
3 5 8 2.000000 5.5
4 4 5 3.333333 4.0
Code to generate the initial df:
df = pd.DataFrame(np.random.randint(1, 11, size=(5, 7)), columns=('foo', 'bar', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08')) '2016-07', '2016-08'))
Upvotes: 2
Views: 369
Reputation: 1433
data = [[2,2,2,3,3,3],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5]]
df = pd.DataFrame(data, columns = ['A','1996-04','1996-05','2000-07','2000-08','2010-10'])
# separate year columns and other columns
# separate year columns
df3 = df.iloc[:, 1:]
# separate other columns
df2 = df.iloc[:,0]
#apply groupby using period index
df3=df3.groupby(pd.PeriodIndex(df3.columns, freq='Q'), axis=1).mean()
final_df = pd.concat([df3,df2], axis=1)
print(final_df)
Upvotes: 0
Reputation: 30991
The solution is quite short:
Start from copying "monthly" columns to another DataFrame and converting column names to PeriodIndex:
df2 = df.iloc[:, 2:]
df2.columns = pd.PeriodIndex(df2.columns, freq='M')
Then, to get the result, resample columns by quarter, compute the mean (for each quarter) and join with 2 "initial" columns:
df.iloc[:, :2].join(df2.resample('Q', axis=1).agg('mean'))
Upvotes: 0
Reputation: 294318
Use a callable that gets applied to the index values. Use axis=1
to apply it to the column values instead.
(df.set_index(['foo', 'bar'])
.groupby(lambda x: pd.Period(x, 'Q'), axis=1)
.mean().reset_index())
foo bar 2016Q2 2016Q3
0 6 5 3.666667 4.5
1 9 3 8.000000 7.5
2 8 5 6.000000 6.5
3 5 8 2.000000 5.5
4 4 5 3.333333 4.0
Upvotes: 4