Function to transform u2014, u2019, etc. to UTF-8 characters

Question

When I copy and paste from web pages to Emacs I often end up with my buffer looking like this:

Here\u2019s a practical example:

Instead of:

Here’s a practical example:

I've now got two different issues (but they're related):

How should I configure Emacs so that, from now on, when I copy and paste, I get immediately the UTF-8 characters instead of the \uxxxx escaping?
How can I transform all the previous files which I already saved and which contain the bogus encoding?

Is there already a function doing the transformation somewhere that I could simply call?

jpkotta · Accepted Answer

It appears that Emacs understands these escape codes. You can just read them in as lisp strings and it automatically converts them. Hopefully this can be made less clunky.

C-M-% $\u[0-9A-Fa-f]\{4\}$ RET \,(read (concat """ \1 """)) RET

If you aren't familiar with \, in replacement strings, it allows you to evaluate arbitrary lisp expressions.

Function to transform u2014, u2019, etc. to UTF-8 characters

Answers (2)

Related Questions