Cedric Martin
Cedric Martin

Reputation: 6014

Function to transform u2014, u2019, etc. to UTF-8 characters

When I copy and paste from web pages to Emacs I often end up with my buffer looking like this:

Here\u2019s a practical example:

Instead of:

Here’s a practical example:

I've now got two different issues (but they're related):

  1. How should I configure Emacs so that, from now on, when I copy and paste, I get immediately the UTF-8 characters instead of the \uxxxx escaping?

  2. How can I transform all the previous files which I already saved and which contain the bogus encoding?

Is there already a function doing the transformation somewhere that I could simply call?

Upvotes: 1

Views: 250

Answers (2)

J.y B.y
J.y B.y

Reputation: 605

You need to specify the encoding system used for transferring selections to and from other programs through the window system according to the coding system you use for your files.

That's the function set-selection-coding-system, normally bound to C-x RET x

I set my file to utf-8 by default. I had the same problem as you until I set the font for copy pasting to utf-8 as well.

Upvotes: 0

jpkotta
jpkotta

Reputation: 9437

It appears that Emacs understands these escape codes. You can just read them in as lisp strings and it automatically converts them. Hopefully this can be made less clunky.

C-M-% \(\\u[0-9A-Fa-f]\{4\}\) RET \,(read (concat "\"" \1 "\"")) RET

If you aren't familiar with \, in replacement strings, it allows you to evaluate arbitrary lisp expressions.

Upvotes: 2

Related Questions