Reputation: 656
I am trying to change the names of a few columns in my dataframe. The below code is able to change the names of all columns, excepting one. There are no white spaces before or after the name of the misbehaving column ('Tot Cases/1M pop'
). I am unable to figure out what is the problem. Appreciate suggestions.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216 entries, 0 to 215
Data columns (total 12 columns):
Country,Other 216 non-null object
TotalCases 216 non-null int64
NewCases 139 non-null object
TotalDeaths 178 non-null float64
NewDeaths 91 non-null object
TotalRecovered 207 non-null float64
ActiveCases 216 non-null int64
Serious,Critical 137 non-null float64
Tot Cases/1M pop 214 non-null float64
Deaths/1M pop 176 non-null float64
TotalTests 179 non-null float64
Tests/ 1M pop 179 non-null float64
dtypes: float64(7), int64(2), object(3)
memory usage: 20.4+ KB
df = df.rename(columns={'Country,Other': 'Country_or_Other','Serious,Critical': 'Serious_or_Critical','Tot Cases/1M pop':'Cases_1M_pop', 'Deaths/1M pop':'Deaths_per_1M_pop','Tests/ 1M pop':'Tests_per_1M_pop'})
df.head(3)
Country_or_Other TotalCases NewCases TotalDeaths NewDeaths TotalRecovered ActiveCases Serious_or_Critical Tot Cases/1M pop Deaths_per_1M_pop TotalTests Tests_per_1M_pop
0 World 3481349 83255.0 244663.0 5215.0 1120908.0 2115778 50860.0 447.0 31.4 NaN NaN
1 China 82875 1.0 4633.0 NaN 77685.0 557 37.0 58.0 3.0 NaN NaN
2 USA 1160774 29744.0 67444.0 1691.0 173318.0 920012 16475.0 3507.0 204.0 6931132.0 20940.0
for col in df.columns:
print(col, len(col))
Country_or_Other 16
TotalCases 10
NewCases 8
TotalDeaths 11
NewDeaths 9
TotalRecovered 14
ActiveCases 11
Serious_or_Critical 19
Tot Cases/1M pop 16
Deaths_per_1M_pop 17
TotalTests 10
Tests_per_1M_pop 16
print (df.columns.tolist())
['Country_or_Other',
'TotalCases',
'NewCases',
'TotalDeaths',
'NewDeaths',
'TotalRecovered',
'ActiveCases',
'Serious_or_Critical',
'Tot\xa0Cases/1M pop',
'Deaths_per_1M_pop',
'TotalTests',
'Tests_per_1M_pop']
print([(i, hex(ord(i))) for i in df.columns[8]])
[('T', '0x54'), ('o', '0x6f'), ('t', '0x74'), ('\xa0', '0xa0'), ('C', '0x43'), ('a', '0x61'), ('s', '0x73'), ('e', '0x65'), ('s', '0x73'), ('/', '0x2f'), ('1', '0x31'), ('M', '0x4d'), (' ', '0x20'), ('p', '0x70'), ('o', '0x6f'), ('p', '0x70')]
Upvotes: 1
Views: 299
Reputation: 130
You could also rename the specific column by addressing the index directly as follows:
df.columns.values[8] = "New name"
Upvotes: 1
Reputation: 863721
You can check this what is \xa0
value after testing by print (df.columns.tolist())
:
\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space.
So change problematic column name like:
df = df.rename(columns={'Country,Other': 'Country_or_Other',
'Serious,Critical': 'Serious_or_Critical',
'Tot\xa0Cases/1M pop':'Cases_1M_pop',
'Deaths/1M pop':'Deaths_per_1M_pop',
'Tests/ 1M pop':'Tests_per_1M_pop'})
Upvotes: 3