Reputation: 1982
I encountered a problem with Python script I wrote while running in a Windows CMD window, and boiled the essence of the problem down to the following SSCCE:
The Python script (x.py)
import sys
in_file = open (sys.argv[1], 'rt')
for line in in_file:
line = line.rstrip ('\n')
print ('line="%s"' % (line))
in_file.close ()
The input data file (x.txt)
Line 1
Line 2 “text”
Line 3
The command line invocation
python x.py x.txt
The error output
C:\junk>python x.py x.txt
line="Line 1"
Traceback (most recent call last):
File "x.py", line 7, in <module>
print ('line="%s"' % (line))
File "C:\Program Files (x86)\Python34\lib\encodings\cp862.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 13: character maps to <undefined>
C:\junk>
It seems to be failing on the second input record ("Line 2"). What am I doing wrong?
Upvotes: 2
Views: 2224
Reputation: 1982
The answer turned out to be a Windows codepage issue.
The second input line uses the ANSI typographical characters 0x93 (147) and 0x94 (148), corresponding to the left and right quotation marks, respectively. Although the input file was meant to be an ASCII file (i.e., characters < 128 decimal), word processors, in contrast to text editors, will often insert these specialized characters.
Python read it well enough, but threw an exception when trying to print it to the console window. As the error output shows, the error message emanates from lib\encodings\cp862.py, which corresponds to code page 862, MS_DOS's code page for Hebrew. Windows attempts to convert the ANSI character 0x93 (147) to the Unicode U+201C ("LEFT DOUBLE QUOTATION MARK"), which Python's default encoding cannot support.
Executing the CHCP (Change Codepage) command gives:
C:\junk>chcp
Active code page: 862
C:\junk>
Changing the CMD window's codepage to CP 1252 ("Latin-1") solves the problem:
C:\junk>chcp 1252
Active code page: 1252
C:\junk>python x.py x.txt
line="Line 1"
line="Line 2 “text”"
line="Line 3"
C:\junk>
Upvotes: 2