Reputation: 333
I'm using this dataframe
(df1
) :
df1 = pd.read_csv('City_Zhvi_AllHomes.csv',header=None)
Its content contains columns written in the format Year-Months(1-12). Now I have to create a new dataframe
that contains data from Df1
divided in 4 quarters. Here's what I have done:
Month = ['00','01','02','03','04','05','06','07','08','09','10','11','12']
Year=['00','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16']
for i in range(0,15):
for j in range(0,12):
newdf = pd.DataFrame[df1.loc[2,'20'+Year[i]+'-'+Month[j+1]] + df1.loc[2,'20'+Year[i]+'-'+Month[j+2]]+ df1.loc[2,'20'+Year[i]+'-'+Month[j+3]],columns = ['Q1','Q2','Q3','Q4'] ]
j = j + 3
print newdf .
I have done this specifically for second row however I need to perform this for entire dataset
Now I get this error:
'type' object has no attribute 'getitem'
Here's the sample data :
0 CountyName SizeRank 2000-01 2000-02 2000-03 2000-04
1 Queens 1 NaN NaN NaN NaN
2 Los Angeles 2 204400 207000 209800 212300
3 Cook 3 136800 138300 140100 141900
4 Philadelphia 4 52700 53100 53200 53400
newdf should contain :
Q1
(204400+207000+209800)
Similarly Q2,Q3,Q4
How do I proceed?
Upvotes: 1
Views: 195
Reputation: 863291
I think you can use resample
by columns axis=1
with aggregate sum
, but first is necessary set_index
of columns CountyName
and SizeRank
and then convert column names first to_datetime
and then to_period
:
#remove header=None for first row in csv as columns
df = pd.read_csv('City_Zhvi_AllHomes.csv')
df = df.set_index(['CountyName','SizeRank'])
#with real data set index of all not dates columns
#df = df.set_index(['RegionID', 'RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'])
df.columns = pd.to_datetime(df.columns).to_period('M')
print (df)
2000-01 2000-02 2000-03 2000-04
CountyName SizeRank
Queens 1 NaN NaN NaN NaN
Los Angeles 2 204400.0 207000.0 209800.0 212300.0
Cook 3 136800.0 138300.0 140100.0 141900.0
Philadelphia 4 52700.0 53100.0 53200.0 53400.0
print (df.columns)
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04'], dtype='period[M]', freq='M')
df = df.resample('Q', axis=1).sum()
print (df)
2000Q1 2000Q2
CountyName SizeRank
Queens 1 NaN NaN
Los Angeles 2 621200.0 212300.0
Cook 3 415200.0 141900.0
Philadelphia 4 159000.0 53400.0
Upvotes: 1