matth
matth

Reputation: 2696

Python line.replace returns UnicodeEncodeError

I have a tex file that was generated from rst source using Sphinx, it is encoded as UTF-8 without BOM (according to Notepad++) and named final_report.tex, with following content:

% Generated by Sphinx.
\documentclass[letterpaper,11pt,english]{sphinxmanual}
\usepackage[utf8]{inputenc}
\begin{document}

\chapter{Preface}
Krimson4 is a nice programming language.
Some umlauts äöüßÅö.
That is an “double quotation mark” problem.
Johnny’s apostrophe allows connecting multiple ports.
Components that include data that describe how they ellipsis …
Software interoperability – some dash – is not ok.
\end{document}

Now, before I compile the tex source to pdf, I want to replace some lines in the tex file to get nicer results. My script was inspired by another SO question.

#!/usr/bin/python
# -*- coding: utf-8 -*-
import os

newFil=os.path.join("build", "latex", "final_report.tex-new")
oldFil=os.path.join("build", "latex", "final_report.tex")

def freplace(old, new):
    with open(newFil, "wt", encoding="utf-8") as fout:
        with open(oldFil, "rt", encoding="utf-8") as fin:
            for line in fin:
                print(line)
                fout.write(line.replace(old, new))
    os.remove(oldFil)
    os.rename(newFil, oldFil)

freplace('\documentclass[letterpaper,11pt,english]{sphinxmanual}', '\documentclass[letterpaper, 11pt, english]{book}') 

This works on Ubuntu 16.04 with Python 2.7 as well as Python 3.5, but it fails on Windows with Python 3.4. The error message I get is:

File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 11: character maps to <undefined>

where 201c stands for left double quotation mark. If I remove the problematic character, the script proceeds till it finds the next problematic character.

In the end, I need a solution that works on Linux and Windows with Python 2.7 and 3.x. I tried quite a lot of the solutions suggested here on SO, but could not yet find one that works for me...

Upvotes: 2

Views: 195

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180482

You need to specify the correct encoding with the encoding="the_encoding":

with open(oldFil, "rt", encoding="utf-8") as fin,  open(newFil, "wt", encoding="utf-8") as fout:

If you don't the preferred encoding will be used.

open

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding

Upvotes: 2

Related Questions