Outcast
Outcast

Reputation: 5117

Cannot rename column after GroupBy

I am doing the following:

# Load date
data = pd.read_csv('C:/Users/user/Desktop/STOCKS.txt', keep_default_na=True, sep='\t', nrows=5)

# Convert dates from object columns to datetime columns
data['DATE'] = pd.to_datetime(data_orders['DATE'])

print(data.columns)
# Index(['COUNTRY_ID', 'STOCK_ID', 'DATE', 'STOCK_VALUE'], dtype='object')

# Count of stock per country per day
data_agg= data.groupby(['COUNTRY_ID'], as_index=False).agg({'DATE': 'count'})

print(data_agg.columns)
# Index(['COUNTRY_ID', 'DATE'], dtype='object')

# Rename count column
data_agg.rename({'DATE': 'Count'}, inplace=True)

print(data_agg.columns)
# Index(['COUNTRY_ID', 'DATE'], dtype='object')

As you see above at the last lines, I try to rename the aggregated column after the groupby but for some reason this does not work (I still get the name DATE for this column instead of Count).

How can I fix this?

Upvotes: 1

Views: 401

Answers (2)

Manoj Kumar Biroj
Manoj Kumar Biroj

Reputation: 323

I thinks this solves your problem

data_agg=data_agg.rename(columns={'Date':'Count'})

Upvotes: 1

jezrael
jezrael

Reputation: 863281

Need columns keyword, if omit it, rename try change values of index:

data_agg.rename(columns={'DATE': 'Count'}, inplace=True)

rng = pd.date_range('2017-04-03', periods=10)
data = pd.DataFrame({'DATE': rng, 'COUNTRY_ID': [3]*3+ [4]*5 + [1]*2})  
print (data)
        DATE  COUNTRY_ID
0 2017-04-03           3
1 2017-04-04           3
2 2017-04-05           3
3 2017-04-06           4
4 2017-04-07           4
5 2017-04-08           4
6 2017-04-09           4
7 2017-04-10           4
8 2017-04-11           1
9 2017-04-12           1

data_agg= data.groupby(['COUNTRY_ID'], as_index=False).agg({'DATE': 'count'})

data_agg.rename({'DATE': 'Count', 1:'aaa'}, inplace=True)
print (data_agg)
     COUNTRY_ID  DATE
0             1     2
aaa           3     3
2             4     5

data_agg.rename(columns={'DATE': 'Count', 1:'aaa'}, inplace=True)
print (data_agg)
   COUNTRY_ID  Count
0           1      2
1           3      3
2           4      5

Another solution is remove as_index=False and use DataFrameGroupBy.count with Series.reset_index with :

data_agg= data.groupby('COUNTRY_ID')['DATE'].count().reset_index(name='Count')

print (data_agg)
   COUNTRY_ID  Count
0           1      2
1           3      3
2           4      5

Upvotes: 4

Related Questions