Reputation: 3667
I am reading in a csv but when I take a closer look at the column names there is a weird symbol next to the first column name, can anyone help me get rid of this symbol?
How column names look now(not sure what the symbols next to 'year' mean:
['year', 'sch', 'city', 'prop_id']
How I want column name to look:
['year', 'sch', 'city', 'prop_id']
my code so far:
import pandas as pd
path = ('file_path')
cameron_county = pd.read_table(path + '/2016_GCC_prelim_appraisal_info_20160630.txt',
encoding = 'latin1',error_bad_lines = False)
print(cameron_county.head(1))
print(cameron_county.columns)
thank you in advance.
Upvotes: 2
Views: 1770
Reputation: 294218
A post import solution might look like:
columns = pd.Index(['year', 'sch', 'city', 'prop_id'])
columns.str.replace(r'[^a-zA-Z0-9_-]', '')
Index([u'year', u'sch', u'city', u'prop_id'], dtype='object')
Upvotes: 2
Reputation: 393943
this looks like unciode BOM try
cameron_county = pd.read_table(path + '/2016_GCC_prelim_appraisal_info_20160630.txt',
encoding = 'utf-8',error_bad_lines = False)
See: https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

is the CP1252 representation of the utf-8 BOM hex code: EF BB BF
Upvotes: 2