user2604052
user2604052

Reputation: 51

Solving error: unmappable character for encoding UTF8

I have a maven project, the character encoding is set as UTF-8 in my parent pom.

    <plugin>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.3.2</version>
      <configuration>
        <source>1.7</source>
        <target>1.7</target>
        <encoding>UTF-8</encoding>
      </configuration>
    </plugin>

But in the Java file, some characters like ` or has been used and it is causing compilation error to me.

In the Eclipse (Properties----Resource -----Text File encoding and Windows--preferences---workspace---text file encoding), I have specified the encoding as UTF-8. Please let me know how this issue can be solved.

PERL CODE TO DO CONVERSION STUFF

use strict;
use warnings;
use File::Find;
use open qw/:std :utf8/;

my $dir = "D:\\files";


find({ wanted => \&collectFiles}, "$dir");

sub collectFiles {
    my $filename = $_;
        if($filename =~ /.java$/){
        #print $filename."\n";
        startConversion($filename);
    }
}

sub startConversion{
    my $filename = $_;
    print $filename."\n";
    open(my $INFILE,  '<:encoding(cp1252)',  $filename) or die $!;
    open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;
}

Upvotes: 2

Views: 20809

Answers (2)

David W.
David W.

Reputation: 107090

If you're on Linux or Mac OS X, you can use iconv to convert files to UTF-8. Java 1.7 does not allow for non-utf8 characters, but Java 1.6 does (although it produces a warning). I know because I have Java 1.7 on my Mac, and I can't compile some of our code because of this while Windows users and our Linux continuous build machine can because they both still use Java 1.6.

The problem with your Perl script is that you're opening a file for reading and the same file for writing, but you're using the same file name. When you open the file for writing, you are deleting its contents.

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

use File::Find;

use strict;
use warnings;
use autodie;

use constant  {
    SOURCE_DIR       => 'src',
};


my @file_list;
find {
    next unless -f;
    next unless /\.java$/;
    push $file_list, $File::Find::name;
}, SOURCE_DIR;

for my $file ( @file_list ) {
    open my $file_fh, "<:encoding(cp1252)", $file;
    my @file_contents = <$file_fh>;
    close $file_fh;

    open my $file_fh, ">:encoding(utf8)", $file;
    print {$file_fh} @file_contents;
    close $file_fh;
}

Note I am reading the entire file into memory which should be okay with Java source code. Even a gargantuan source file (10,000 lines long with an average line length of 120 characters) will be just over 1.2 megabytes. Unless you're using a TRS-80, I a 1.2 megabyte file shouldn't be a memory issue. If you want to be strict about it, use File::Temp to create a temporary file to write to, and then use File::Copy to rename that temporary file. Both are standard Perl modules.

You can also enclosed the entire program in the find subroutine too.

Upvotes: 0

amon
amon

Reputation: 57640

These two lines do not start or perform re-encoding:

open(my $INFILE,  '<:encoding(cp1252)',  $filename) or die $!;
open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;

Opening a file with > truncates it, which deletes the content. See the open documentation for further details.

Rather, you have to read the data from the first file (which automatically decodes it), and write it back to another file (which automatically encodes it). Because source and target file are identical here, and because of the quirks of file handling under Windows, we should write our output to a temp file:

use autodie;  # automatic error handling :)

open my $in,  '<:encoding(cp1252)', $filename;
open my $out, '>:encoding(UTF-8)', "$filename~";  # or however you'd like to call the tempfile
print {$out} $_ while <$in>;  # copy the file, recoding it
close $_ for $in, $out;

rename "$filename~" => $filename;  # BEWARE: doesn't work across logival volumes!

If the files are small enough (hint: source code usually is), then you could also load them into memory:

use File::Slurp;

my $contents = read_file $filename, { binmode => ':encoding(cp1252)' };
write_file $filename, { binmode => ':encoding(UTF-8)' }, $contents;

Upvotes: 1

Related Questions