Ali Arda Orhan
Ali Arda Orhan

Reputation: 784

XML encoding UTF-8 not working for turkish characters

I have a method to create and record to xml file. It produces corrupted result. My turkish characters writing as hexadecimal expressions. While i'm using UTF-8, i couldn't solve the problem. By the way i checked both with Sublime and Notepad++ editors.

public boolean add(BatFile batFile) throws Exception {
        File inputFile = new File(fileLocation);
        DocumentBuilderFactory docFactory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
        Document doc = docBuilder.parse(inputFile);

        Element rootElement = doc.getDocumentElement();

        Element batFileElement = doc.createElement("BatFile");
        rootElement.appendChild(batFileElement);

        Element batJobName = doc.createElement("Name");
        batJobName.appendChild(doc.createTextNode(batFile.getName()));
        batFileElement.appendChild(batJobName);

        Element batFileBriefDesc = doc.createElement("BriefDesc");
        batFileBriefDesc
                .appendChild(doc.createTextNode(batFile.getBriefDesc()));
        batFileElement.appendChild(batFileBriefDesc);

        Element batFileDesc = doc.createElement("Desc");
        batFileDesc.appendChild(doc.createTextNode(batFile.getDesc()));
        batFileElement.appendChild(batFileDesc);

        Element batFileName = doc.createElement("FileName");
        batFileName.appendChild(doc.createTextNode(batFile.getFileName()));
        batFileElement.appendChild(batFileName);

        Element batCommandArgs = doc.createElement("CommandArgs");

        for (int k = 0; k < batFile.getCommandArgs().size(); k++) {
            Element commandArg = doc.createElement("CommandArg");
            // commandArg.setAttribute("ID", String.valueOf(k));
            commandArg.appendChild(doc.createTextNode(batFile.getCommandArgs()
                    .get(k)));
            batCommandArgs.appendChild(commandArg);

        }
        batFileElement.appendChild(batCommandArgs);

        Element batCreationTime = doc.createElement("CreationTime");
        batCreationTime.appendChild(doc.createTextNode(batFile
                .getCreationTime()));
        batFileElement.appendChild(batCreationTime);

        Element batSchedulerPattern = doc.createElement("SchedulerPattern");
        batSchedulerPattern.appendChild(doc.createTextNode(batFile
                .getExecutionPattern()));
        batFileElement.appendChild(batSchedulerPattern);

        Element batTaskID = doc.createElement("TaskID");
        if (batFile.getTaskID() != null) {
            batTaskID.appendChild(doc.createTextNode(batFile.getTaskID()));
        }
        batFileElement.appendChild(batTaskID);

        TransformerFactory tFactory = TransformerFactory.newInstance();
        Transformer transformer = tFactory.newTransformer();
        DOMSource domSource = new DOMSource(doc);
        StreamResult result = new StreamResult(new FileWriter(inputFile));
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.transform(domSource, result);
        return true;

    }

When i test it with those codes below:

    @Test
    public void testAddingTask() throws Exception {
        IBAO testBao = XMLBAO.getInstance();
        BatFile testBatFile = new BatFile();
        testBatFile.setName("ŞŞŞŞŞ");
        testBatFile.setBriefDesc("ÇÇÇÇÇÇ");
        testBatFile.setDesc("ĞĞĞĞĞĞ");
        testBatFile.setFileName("FileName");
        testBatFile.setCreationTime("Merhaba");
        testBatFile.setExecutionPattern("ööçöçöçüü");
        testBatFile.addCommandArgs("ZZZZZZZZ");
        testBatFile.setTaskID("ÜÜÜÜÜÜÜÜ");
        testBao.add(testBatFile);
    }

It produces me this result:

<BatFiles>  
<BatFile>
<Name>???/Name>
<BriefDesc>???</BriefDesc>
<Desc>???</Desc>
<FileName>FileName</FileName>
<CommandArgs>
<CommandArg>ZZZZZZZZ</CommandArg>
</CommandArgs>
<CreationTime>Merhaba</CreationTime>
<SchedulerPattern>??????</SchedulerPattern>
<TaskID>????</TaskID>
</BatFile>
</BatFiles>

Upvotes: 0

Views: 1638

Answers (1)

McDowell
McDowell

Reputation: 108959

You're writing to a character stream and not letting the API control which encoding the data is written as. FileWriter uses the default platform encoding which might not be UTF-8:

The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable.

Use a FileOutputStream with the StreamResult (in a try-with-resources block.)


You might also be having issues due to Java source file encodings. Consider using Unicode escapes instead of literals. That is, "\u015E" instead of "Ş".

Upvotes: 1

Related Questions