AltShift
AltShift

Reputation: 346

Pandas read_table returns ’ characters

I'm seeing things like ’ after reading a text file with read_table(). The input file contents appear as ordinary ASCII characters in Windows Notepad.

dataRaw = pd.read_table('data.txt', header=None)

Do I need to include some character set parameter to prevent this?

Upvotes: 0

Views: 2278

Answers (1)

AltShift
AltShift

Reputation: 346

I figured it out. It took two steps: (1) use the correct encoding; (2) convert things that are supposed to be apostrophes to apostrophes.

for line in open(dataPath, encoding='utf-8'):
   outstr = re.sub(r'[´]', '’', line)  # replace non-ASCII tick with apostrophe
   outstr = re.sub('[\']', '’', outstr)  # replace single quote with apostrophe

Thanks for the tip.

Upvotes: 1

Related Questions