Reputation: 346
I'm seeing things like ’
after reading a text file with read_table(). The input file contents appear as ordinary ASCII characters in Windows Notepad.
dataRaw = pd.read_table('data.txt', header=None)
Do I need to include some character set parameter to prevent this?
Upvotes: 0
Views: 2278
Reputation: 346
I figured it out. It took two steps: (1) use the correct encoding; (2) convert things that are supposed to be apostrophes to apostrophes.
for line in open(dataPath, encoding='utf-8'):
outstr = re.sub(r'[´]', '’', line) # replace non-ASCII tick with apostrophe
outstr = re.sub('[\']', '’', outstr) # replace single quote with apostrophe
Thanks for the tip.
Upvotes: 1