Reputation: 21
I have a Java class that has the following:
public static final blob COPYRIGHT_MARK = new Blob("div.legal_footer span", "© " + new SimpleDateFormat("yyyy").format(new Date()) + " Acme LLC. All Rights Reserved.");
The project with the class is stored in a git repo and pulled by a Jenkins job to run unit tests. When the file is pulled into the Jenkins job workspace, a character is injected before the copyright symbol in the string:
public static final blob COPYRIGHT_MARK = new Blob("div.legal_footer span", "© " + new SimpleDateFormat("yyyy").format(new Date()) + " Acme LLC. All Rights Reserved.");
This is causing a test failure.
The java class is encoded as UTF-8. The project can be built and the test run locally without any problems. The Jenkins instance is running on OSX. The code was also written on a Mac.
I'm stumped as to why the file is being altered when pulled into the workspace.
Any suggestions of what to check?
Upvotes: 0
Views: 91
Reputation: 140228
You need to declare in some configuration file/parameter/environment variable that the encoding to be used is UTF-8. Having the file physically encoded as UTF-8 is just half the battle, any reader of the file needs to be informed of this fact as well.
There is no character injection, it's just a coincidence that the mojibake contains the copyright character as well.
You have encoded the file as UTF-8, so in reality it has the bytes:
0xC2 0xA9
When a reader of this file knows to interpret the file as UTF-8, the character ©
will correctly appear.
However, if the reader of this file does not know what encoding to interpret the file in, it will most likely be interpreted incorrectly.
In your case the file was incorrectly interpreted possibly as Windows-1252/cp1252/"ANSI" or ISO-8859-1. In those encodings 0xC2 0xA9
decodes to ©
and all the other bytes decode to the same characters as in UTF-8 - again a coincidence. If you only used the characters with same encoding mapping, you wouldn't even notice there is a problem.
Upvotes: 4