Reputation: 21356
The Logback 1.1.3 LayoutWrappingEncoder
documentation doesn't indicate what the default charset will be if the user doesn't set it, but the source code says:
By default this property has the value null which corresponds to the system's default charset.
However I'm using a PatternLayoutEncoder
(with a RollingFileAppender
), and it seems to be outputting files in UTF-8 (and the default charset of my Windows 7 Professional system is probably not UTF-8).
UTF-8 output is actually what I want, but I want to make sure I'm not getting this by chance, since the documentation seems to indicate something else. So why is Logback giving me UTF-8 output when I haven't explicitly specified a charset?
Upvotes: 13
Views: 20052
Reputation: 2884
Logback Character Encoding
You can use <charset>
in the definition of your PatternLayoutEncoder
as this is a subclass of LayoutWrappingEncoder
, which provides the setCharset
method. This is indicated in the documentation by an excerpt from the class, but no example xml configuration is given. For the LayoutWrappingEncoder an answer has been given here: [Logback-user]: How to use UTF-8.
So if you configure via code you can call the setCharset
method with UTF-8. Or if you are configuring via xml this is:
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<charset>UTF-8</charset>
<outputPatternAsHeader>true</outputPatternAsHeader>
<pattern>[%thread] %-5level %logger{35} - %msg%n</pattern>
</encoder>
Default File Encoding
Logback's documentation is correct in stating that the default character encoding is used. The default character set is not typically UTF-8 on windows (mine is windows-1252
for instance). The correct thing to do it configure logback to be UTF-8 as above. Even if logback is picking UTF-8 up from somewhere, or file.encoding
is somehow being set by you, there's no guarentee that this will happen in the future.
Incidentally Sun had previously said about file.encoding, if you are setting this on an Oracle VM:
The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.
Eclipse and Maven
If you are running maven from eclipse and you've already set your environment to be UTF-8 either in for the environment/project or the Run Configuration (for me in the common tab) then eclipse will arrange for the new JVM to have UTF-8 encoding by setting file.encoding
. See: Eclipse's encoding documentation
Upvotes: 21
Reputation: 6538
The system's default charset is determined by Java and set in the system property file.encoding
, but this property can also be specified as the JVM starts up (more in this answer). Eclipse, Netbeans, Maven, etc. can use this system property to set the default charset to UTF-8 and that is probably why output is in UTF-8 even though you did not specify it.
To remove the element of chance, specify the character set for logging as shown in this answer. Logback's source code shows how the character set is used to convert the Strings to bytes to write to file in the convertToBytes method (more on Strings to bytes is explained in this answer).
On Unix, the value for file.encoding
is determined using the environment variables (e.g. via LANG=en_US.UTF-8
as explained here, but other environment variables can be involved as well).
On Windows, the default code page is shown with the command chcp
. The code page number corresponds with a character set shown in this list. For example, code page 65001 corresponds with UTF-8. The default locale is shown with the command systeminfo | findstr Locale
.
In short: once your software leaves your development environment, you cannot assume any specific default character set. Therefore, always specify a character set.
Upvotes: 5