Reputation: 9213
I have a csv file with dates as columns headers and binary a matrix of 1, 0 or np.nan
.
I'd like to take the mean of each index, grouped by month. I am running into a problem because my columns are not a datetimeindex, which I try to convert to with pd.to_datetime()
with no luck.
binary.csv:
2016-01-01 00:00:00,2016-01-02 00:00:00,2016-02-01 00:00:00,2016-02-02 00:00:00
1,,0,1
0,1,,1
My Code:
import pandas as pd
import numpy as np
df = pd.read_csv('binary.csv')
df.columns = pd.to_datetime(df.columns, format='%Y-%m-%d %H:%M:%S')
df = df.groupby(pd.TimeGrouper(freq='M'), axis=0)
print df
Error:
TypeError: axis must be a DatetimeIndex, but got an instance of 'Int64Index'
Desired Output:
2016-01-01 00:00:00 2016-02-01 00:00:00
0 1.0 0.5
1 0.5 1.0
Updated question:
Based on best answer:
If I wanted to a single value for each month, is there a more efficient way to do that than this?
pd.DataFrame(data=df.resample('MS', axis=1).mean().mean()).transpose()
Upvotes: 0
Views: 1219
Reputation:
By default, pd.TimeGrouper works on the index (axis=0) so you need to tell it that it should group the columns instead:
df.groupby(pd.TimeGrouper(freq='MS', axis=1), axis=1).mean()
Out:
2016-01-01 2016-02-01
0 1.0 0.5
1 0.5 1.0
You can directly use resample, too:
df.resample('MS', axis=1).mean()
Out:
2016-01-01 2016-02-01
0 1.0 0.5
1 0.5 1.0
Upvotes: 2