Reputation: 775
Say you have a file which contains both UTF-8 characters and UTF-8 characters there were once read by a program who thought they were ISO-8859-1. So you have things like "é" instead of "é". How do you fix that ?
Upvotes: 1
Views: 232
Reputation: 775
I finally came up with a single sed command that did the job for me :
LANG='' sed -re 's/(\xc3)\x83\xc2([\x80-\xbf])/\1\2/g'
It does not handle unicode code point 0xA0 to 0xBF, but it should be pretty easy to adapt for those.
Upvotes: 1