Reputation: 5117
I am doing the following:
# Load date
data = pd.read_csv('C:/Users/user/Desktop/STOCKS.txt', keep_default_na=True, sep='\t', nrows=5)
# Convert dates from object columns to datetime columns
data['DATE'] = pd.to_datetime(data_orders['DATE'])
print(data.columns)
# Index(['COUNTRY_ID', 'STOCK_ID', 'DATE', 'STOCK_VALUE'], dtype='object')
# Count of stock per country per day
data_agg= data.groupby(['COUNTRY_ID'], as_index=False).agg({'DATE': 'count'})
print(data_agg.columns)
# Index(['COUNTRY_ID', 'DATE'], dtype='object')
# Rename count column
data_agg.rename({'DATE': 'Count'}, inplace=True)
print(data_agg.columns)
# Index(['COUNTRY_ID', 'DATE'], dtype='object')
As you see above at the last lines, I try to rename the aggregated column after the groupby but for some reason this does not work (I still get the name DATE
for this column instead of Count
).
How can I fix this?
Upvotes: 1
Views: 401
Reputation: 323
I thinks this solves your problem
data_agg=data_agg.rename(columns={'Date':'Count'})
Upvotes: 1
Reputation: 863281
Need columns
keyword, if omit it, rename
try change values of index:
data_agg.rename(columns={'DATE': 'Count'}, inplace=True)
rng = pd.date_range('2017-04-03', periods=10)
data = pd.DataFrame({'DATE': rng, 'COUNTRY_ID': [3]*3+ [4]*5 + [1]*2})
print (data)
DATE COUNTRY_ID
0 2017-04-03 3
1 2017-04-04 3
2 2017-04-05 3
3 2017-04-06 4
4 2017-04-07 4
5 2017-04-08 4
6 2017-04-09 4
7 2017-04-10 4
8 2017-04-11 1
9 2017-04-12 1
data_agg= data.groupby(['COUNTRY_ID'], as_index=False).agg({'DATE': 'count'})
data_agg.rename({'DATE': 'Count', 1:'aaa'}, inplace=True)
print (data_agg)
COUNTRY_ID DATE
0 1 2
aaa 3 3
2 4 5
data_agg.rename(columns={'DATE': 'Count', 1:'aaa'}, inplace=True)
print (data_agg)
COUNTRY_ID Count
0 1 2
1 3 3
2 4 5
Another solution is remove as_index=False
and use DataFrameGroupBy.count
with Series.reset_index
with :
data_agg= data.groupby('COUNTRY_ID')['DATE'].count().reset_index(name='Count')
print (data_agg)
COUNTRY_ID Count
0 1 2
1 3 3
2 4 5
Upvotes: 4