Jon Martin Solaas
Jon Martin Solaas

Reputation: 767

Invalid byte 1 of 1-byte UTF-8 sequence

I have a MyFaces Facelets application, where the page coding is a bit rugged. Anyway, it's developed with Eclipse and built with Ant, and kindof runs ok in Tomcat 2.0.26. So far so good.

Now, I'd rather build with Maven, so I made a couple of pom-files, opened them in Netbeans and built, and now I have a war file that deploys ok. However, on any facelet page it barfs out with

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)

So, I've tried a lot of different things, and the application actually run simple pages without facelet stuff. But, everything runs if I just build with Ant instead ... So my question is: What's the most likely difference between an ant build and a maven build that may cause this?

It also seems that even though I've configured for UTF-8 in Netbeans and pom-files, Netbeans eventually ends up reporting the facelet files as ISO-8859-1 after some editing.

I've made sure that most central libs are of same version (especially xerces 2.3.0), I've added an encoding servlet filter that had no effect.

And, I'd rather fix the maven build and keep the buggy pages, than the other way around ... it's my intention to introduce Naven, not fix buggy pages.

Here is what the pom.xml says about encoding:

Basically the pom.xml has the following set ...

 <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.0.2</version>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                    <encoding>${project.build.sourceEncoding}</encoding>>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>2.2</version>
                <configuration>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>

....

    <properties>
        <netbeans.hint.deploy.server>Tomcat60</netbeans.hint.deploy.server>
        <project.build.sourceEncoding>utf-8</project.build.sourceEncoding>
    </properties>

Upvotes: 2

Views: 30792

Answers (4)

Joman68
Joman68

Reputation: 2850

I encountered this error when running some unit tests using maven on a Windows machine.

Files were being written out in the default Windows-1252 format and then some tests were failing when trying to read them as UTF-8.

The solution was to enforce project source encoding for files being written out in the unit tests:

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.20</version>
        <configuration>
            <argLine>-Dfile.encoding=${project.build.sourceEncoding}</argLine>
        </configuration>
        <dependencies>
            <dependency>
                <groupId>org.apache.maven.surefire</groupId>
                <artifactId>surefire-junit47</artifactId>
                <version>2.20</version>
            </dependency>
        </dependencies>
    </plugin>

Where project.build.sourceEncoding was defined in the pom properties:

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

Upvotes: 0

Ederson Amorim
Ederson Amorim

Reputation: 9

I had the same problem!

I had solved it using the following piece of code:

String str = new String(oldstring.getBytes("UTF-8"));

Upvotes: 0

DaBlick
DaBlick

Reputation: 968

On Windows it's very easy. Get Notepad++ if you don't have it, and change the encoding using the "encoding" menu.

Upvotes: 2

McDowell
McDowell

Reputation: 108859

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

The cause of this is a file that is not UTF-8 is being parsed as UTF-8. It is likely that the parser is encountering a byte value in the range FE-FF. These values are invalid in the UTF-8 encoding.

The problem could probably be solved by changing the XML declaration of the file to be the correct encoding or re-encoding the file to UTF-8.

Upvotes: 3

Related Questions