Reputation: 352
I have a dictionary (data_final) of dataframes (health, education, economy,...). The dataframes contain data from one xlsx file. In one of the dataframes (economy), the column names have brackets and single quotes added to it.
data_final['economy'].columns =
Index([ ('Sr.No.',),
('DistrictName',),
('Agriculture',),
('Forestry& Logging',),
('Fishing',),
('Mining &Quarrying',),
('ManufacturingMFG.',),
('RegisteredMFG.',),
('Unregd. MFG.',),
('Electricity,Gas & W.supply',),
('Construction',),
('Trade,Hotels& Restaurants',),
('Railways',),
('Transportby other means',),
('Storage',),
('Communication',),
('Banking &Insurance',),
('Real, Ownership of Dwel. B.Ser.& Legal',),
('PublicAdministration',),
('OtherServices',),
('TotalDDP',),
('Population(In '00)',),
('Per CapitaIncome(Rs.)',)],
dtype='object')
I cannot reference any column using
data_final['economy']['('Construction',)']
gives error -
SyntaxError: invalid syntax
I tried to use replace to remove the brackets -
data_final['economy'].columns = pd.DataFrame(data_final['economy'].columns).replace("(","",regex=True))
But this does not remove the error in column names. How can i remove all these special characters from column names?
Upvotes: 0
Views: 1367
Reputation: 14369
The syntax error should be related to the line
('Population(In '00)',),
The string contains a single quotation mark, which would usually mark the end of the string. If you want to use one in a string, you have to surround it by "
of escape it as \'
. Rsulting in a line like:
('Population(In \'00)',),
The same problem applies to your actual call, you have to escape the quotation mark there as well.
Upvotes: 0
Reputation: 344
It looks as though your column names are being imported/created as tuples. What happens if you try and reference them removing the brackets, but leaving a comma on the end, like so
data_final['economy']['Construction',]
or even with the brackets
data_final['economy'][('Construction',)]
Upvotes: 3