Rohan Bapat
Rohan Bapat

Reputation: 352

Remove special characters from column headers

I have a dictionary (data_final) of dataframes (health, education, economy,...). The dataframes contain data from one xlsx file. In one of the dataframes (economy), the column names have brackets and single quotes added to it.

data_final['economy'].columns = 
Index([                                ('Sr.No.',),
                                 ('DistrictName',),
                                  ('Agriculture',),
                            ('Forestry& Logging',),
                                      ('Fishing',),
                            ('Mining &Quarrying',),
                            ('ManufacturingMFG.',),
                               ('RegisteredMFG.',),
                                 ('Unregd. MFG.',),
                   ('Electricity,Gas & W.supply',),
                                 ('Construction',),
                    ('Trade,Hotels& Restaurants',),
                                     ('Railways',),
                      ('Transportby other means',),
                                      ('Storage',),
                                ('Communication',),
                           ('Banking &Insurance',),
       ('Real, Ownership of Dwel. B.Ser.& Legal',),
                         ('PublicAdministration',),
                                ('OtherServices',),
                                     ('TotalDDP',),
                           ('Population(In '00)',),
                        ('Per CapitaIncome(Rs.)',)],
      dtype='object')

I cannot reference any column using

data_final['economy']['('Construction',)']

gives error -

SyntaxError: invalid syntax

I tried to use replace to remove the brackets -

data_final['economy'].columns = pd.DataFrame(data_final['economy'].columns).replace("(","",regex=True))

But this does not remove the error in column names. How can i remove all these special characters from column names?

Upvotes: 0

Views: 1367

Answers (2)

Klaus D.
Klaus D.

Reputation: 14369

The syntax error should be related to the line

('Population(In '00)',),

The string contains a single quotation mark, which would usually mark the end of the string. If you want to use one in a string, you have to surround it by " of escape it as \'. Rsulting in a line like:

('Population(In \'00)',),

The same problem applies to your actual call, you have to escape the quotation mark there as well.

Upvotes: 0

Ed.
Ed.

Reputation: 344

It looks as though your column names are being imported/created as tuples. What happens if you try and reference them removing the brackets, but leaving a comma on the end, like so

data_final['economy']['Construction',]

or even with the brackets

data_final['economy'][('Construction',)]

Upvotes: 3

Related Questions