Reputation: 91
I've download json with my conversations archive. I stuck with odd encoding.
Example of json:
{
"sender_name": "Micha\u00c5\u0082",
"timestamp": 1411741499,
"content": "b\u00c4\u0099d\u00c4\u0099",
"type": "Generic"
},
It should be something like this:
{
"sender_name": "Michał",
"timestamp": 1411741499,
"content": "będę",
"type": "Generic"
},
I'm trying to deserialize it like this:
var result = File.ReadAllText(jsonPath, encodingIn);
JavaScriptSerializer serializer = new JavaScriptSerializer();
serializer.MaxJsonLength = Int32.MaxValue;
var conversation = serializer.Deserialize<Conversation>(System.Net.WebUtility.HtmlDecode(result));
Unfortunately the output is like this:
{
"sender_name": "MichaÅ\u0082",
"timestamp": 1411741499,
"content": "bÄ\u0099dÄ\u0099",
"type": "Generic"
},
Anyone know how Facebook encoding the json? I've tried several methods but without results.
Thanks for your help.
Upvotes: 4
Views: 2096
Reputation: 2533
Here is the Java equivalent of the answer above for those interested in a Java version. It seems to work well, you pass the entire message text into the method and what comes back is the original message as it was in Messenger before you downloaded this json nightmare that Facebook puts out.
private String decodeString(String text) {
Charset targetEncoding = Charset.forName("ISO-8859-1");
String unescapeText = StringEscapeUtils.unescapeJava(text);
return new String(unescapeText.getBytes(targetEncoding), StandardCharsets.UTF_8);
}
Upvotes: 0
Reputation: 91
Here is the answer:
private string DecodeString(string text)
{
Encoding targetEncoding = Encoding.GetEncoding("ISO-8859-1");
var unescapeText = System.Text.RegularExpressions.Regex.Unescape(text);
return Encoding.UTF8.GetString(targetEncoding.GetBytes(unescapeText));
}
I've collect all answers, mixed them and here we are. Thank you.
Upvotes: 5