user3586164
user3586164

Reputation: 195

Python/Pandas DataFrame.Drop can't recognize column name in Chinese Characters

Here is the Jupyter script. Any suggestion why the "Does NOT work" in the following?

import pandas as pd
df = pd.read_csv('hw1.csv', encoding='utf-8', skipinitialspace=True )
df.drop(['序号'], axis=1, inplace=True) # <= Works
#df.drop(['年度'], axis=1, inplace=True) # <= Does NOT work
df

----- hw1.csv file ----- 序号,年度,直接排放,间接排放,直接排放间接排放,一般烟煤,汽油,柴油,液化石油气,炼厂干气,天然气 1,2016,4647.09,4843.06,9490.15,2004.98,,136.08,13.9,,45.1816 2,2016,2496.72,3668.16,6164.879999999999,1368.83,,,28.02,,10.593 3,2016,10729.74,4042.2,14771.94,6681.8,,,20.6,, 4,2016,231163.34,206918.68,438082.02,52330.48,,13758.75,997.81,,4690.22 5,2016,7373.27,4994.84,12368.11,3566.25,,,123.6,,60.9229 6,2016,62619.53,3324.15,65943.68,,,,,,2896.1175

Upvotes: 0

Views: 752

Answers (1)

DYZ
DYZ

Reputation: 57105

All of your column titles, except the first one, start with an invisible Byte-Order Marker (BOM), '\ufeff. Remove it before attempting any column-related operation:

'年度' in df.columns
# False
df.columns = [s.replace(u'\ufeff', '') for s in df.columns]
'年度' in df.columns
# True

Upvotes: 3

Related Questions