RustyShackleford
RustyShackleford

Reputation: 3667

Why is column name in dataframe have symbols next to it?

I am reading in a csv but when I take a closer look at the column names there is a weird symbol next to the first column name, can anyone help me get rid of this symbol?

How column names look now(not sure what the symbols next to 'year' mean:

['year', 'sch', 'city', 'prop_id']

How I want column name to look:

['year', 'sch', 'city', 'prop_id']

my code so far:

import pandas as pd

path = ('file_path')

cameron_county = pd.read_table(path + '/2016_GCC_prelim_appraisal_info_20160630.txt',
                             encoding = 'latin1',error_bad_lines = False)

print(cameron_county.head(1))
print(cameron_county.columns)

thank you in advance.

Upvotes: 2

Views: 1770

Answers (2)

piRSquared
piRSquared

Reputation: 294218

A post import solution might look like:

columns = pd.Index(['year', 'sch', 'city', 'prop_id'])
columns.str.replace(r'[^a-zA-Z0-9_-]', '')

Index([u'year', u'sch', u'city', u'prop_id'], dtype='object')

Upvotes: 2

EdChum
EdChum

Reputation: 393943

this looks like unciode BOM try

cameron_county = pd.read_table(path + '/2016_GCC_prelim_appraisal_info_20160630.txt',
                             encoding = 'utf-8',error_bad_lines = False)

See: https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

 is the CP1252 representation of the utf-8 BOM hex code: EF BB BF

Upvotes: 2

Related Questions