Writing JSON to file gives?

Question

I have a JSON file with different names of countries and languages etc. I want to strip it down to just the information I need/want for what I am doing. For example I would like to turn

[{
    "name": {
        "common": "Afghanistan",
        "official": "Islamic Republic of Afghanistan",
        "native": {
            "common": "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646",
            "official": "\u062f \u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646 \u0627\u0633\u0644\u0627\u0645\u064a \u062c\u0645\u0647\u0648\u0631\u06cc\u062a"
        }
    },
    "tld": [".af"],
    "cca2": "AF",
    "ccn3": "004",
    "cca3": "AFG",
    "currency": ["AFN"],
    "callingCode": ["93"],
    "capital": "Kabul",
    "altSpellings": ["AF", "Af\u0121\u0101nist\u0101n"],
    "relevance": "0",
    "region": "Asia",
    "subregion": "Southern Asia",
    "nativeLanguage": "pus",
    "languages": {
        "prs": "Dari",
        "pus": "Pashto",
        "tuk": "Turkmen"
    },
    "translations": {
        "cym": "Affganistan",
        "deu": "Afghanistan",
        "fra": "Afghanistan",
        "hrv": "Afganistan",
        "ita": "Afghanistan",
        "jpn": "\u30a2\u30d5\u30ac\u30cb\u30b9\u30bf\u30f3",
        "nld": "Afghanistan",
        "rus": "\u0410\u0444\u0433\u0430\u043d\u0438\u0441\u0442\u0430\u043d",
        "spa": "Afganist\u00e1n"
    },
    "latlng": [33, 65],
    "demonym": "Afghan",
    "borders": ["IRN", "PAK", "TKM", "UZB", "TJK", "CHN"],
    "area": 652230
}, ...

Into

[{
    "name": {
        "common": "Afghanistan",
        "native": {
            "common": "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646"
        }
    },
    "cca2": "AF"
}, ...

But when I try I get

[{
    "name": {
        "common": "Afghanistan",
        "native": {
            "common": "?????????"   <-- NOT WHAT I WANT
        }
    },
    "cca2": "AF"
},

Here is the important code I used to strip out what I don't want.

byte[] encoded = Files.readAllBytes(Paths.get("countries.json"));
String JSONString =  new String(encoded, Charset.forName("US-ASCII"));
...
Writer writer = new OutputStreamWriter(new FileOutputStream("countriesBetter.json"), "US-ASCII");
writer.write(javaObject.toString());
writer.close();

I cannot figure out why it turns the text into question marks. I have tried several character sets to no avail. When I use UTF-8 i get Ø§Ù�ØºØ§Ù†Ø³ØªØ§Ù†

Please help me. Thank you.

stevegal · Accepted Answer

\u0627 is unicode not ascii and you cannot represent the arabic characters in ascii - hence the ?. For differences between utf formats see Difference between UTF-8 and UTF-16?

when you write it UTF-8 you need to read in the same encoding so the "notepad" knows how to display the bytes it has. If you read it back into java using that encoding it will be unaltered.

Writing JSON to file gives?

Answers (2)

Related Questions