Reputation: 7839
As far as I know, these should be equivalent in a system that uses UTF-8 as the default encoding:
pattern1 = 'Wörterbuch Wortformen'.decode('utf8')
pattern2 = u'Wörterbuch Wortformen'
However, when I send these lines from an Emacs buffer to the Python process (M-x python-shell-send-region
) something strange happens.
>>> pattern1
u'W\xf6rterbuch Wortformen'
>>> pattern2
u'W\xc3\xb6rterbuch Wortformen'
In a Python shell run in a terminal, both lines result in u'W\xf6rterbuch Wortformen'
.
What is going on here?
My locale is configured to use UTF-8.
Upvotes: 1
Views: 375
Reputation:
Here's what I did (might appear helpful later):
Created a single-bit encoded file, say /tmp/test.dat
Opened it in Emacs using hexl-mode
.
Using hexl-insert-hex-char
command inserted bytes C3
and B6
.
Opened this file as text (using text-mode
). Emacs recognized it as file with multibyte encoding and displayed ö
in place of the previous bytes.
Conclusion: you need the encoding system in your buffer which contains the source code to be utf-8
to send two bytes for ö
. However, if it is a single-byte encoding, and given that you select the locale that maps the byte F6
to ö
, you will get that byte.
PS. Make sure you have -*- coding: utf-8 -*-
comment.
Upvotes: 1