vkraemer
vkraemer

Reputation: 9902

Finding the default file encoding of a remote jvm

I need to find out what the default file encoding is on a remote Java vm, in a Java program.

Is there a way to execute Charset.defaultCharset() on the remote vm and get its value back... without altering the program running on the remote jvm?

Update:

I am trying to find out what the default Charset is for a WebLogic 11g or WebLogic 12c server... that I did not start, cannot restart and I do not have the 'right' to deploy code onto it.

I also need to be able to determine the default Charset of the server process from inside a Java program that I am writing. It may execute on the same machine as the server... or not. It is very doubtful that the the server and my program will start with the same environment.

I would prefer a method which depends on very few assumption... so that usually means more code...

I probably cannot execute Charset.defaultCharset() on the server... so I should not have said 'execute Charset.defaultCharset()'. Sorry about that folks. I need to do something that will provide the answer that is as correct as executing Charset.defaultCharset() from inside the server process.

Upvotes: 1

Views: 1884

Answers (3)

vkraemer
vkraemer

Reputation: 9902

Here is what I ended up doing... (roughly)

  mbs = conn.getMBeanServerConnection();
  ObjectName runtime = new ObjectName(ManagementFactory.RUNTIME_MXBEAN_NAME);
  TabularDataSupport foo = 
    (TabularDataSupport) mbs.getAttribute(runtime, "SystemProperties");
  for (Iterator<Object> it = foo.values().iterator(); 
                      it.hasNext() && null == retVal; ) {
    CompositeDataSupport cds = (CompositeDataSupport) it.next();
    for (Iterator<?> iter = cds.values().iterator() ; 
                   iter.hasNext() && null == retVal ;) {
      if ("file.encoding".equals(iter.next()) && iter.hasNext())
        retVal = iter.next().toString();
    }

I connected to the MBeanServer and then worked through the SystemProperties to find the file.encoding for the process on the other end of the connection.

Upvotes: 0

unthought
unthought

Reputation: 661

Edit: After writing my answer, I discovered that it's at least partially based on a faulty assumption, in that Charset.defaultCharset() isn't guaranteed to always return the same value. Some of the approaches below should still work provided that they're tried on the same host as the target application, but I certainly recommend to also read the first two answers of this question for more background.

In particular it might be easier to forcibly override file.encoding instead of trying to figure out what it actually is.


As the javadoc of defaultCharset states:

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system.

Meaning that defaultCharset() is read-only inside the JVM process and will return the same charset for all JVM processes started on the same machine unless their environment has been changed explicitly prior to starting the process (eg. a wrapper/launcher script starting the JVM and setting a different locale for the current process and its children). If you're sure sure that the two processes are started in the same way, then Charset.defaultCharset() should return the same Charset as the application you're asking for.

With that as a backdrop, and in increasing order of annoyance/effort:

  1. If your host is running Unix/Linux, try procfs. Eg. /proc/<vmpid>/environ and /proc/<vmpid>/cmdline (on Linux) would be great places to start because they show you how the process was actually started without the obfuscation of a wrapper script. This solution also gets bonus points because it doesn't need you to restart/alter the application for inspection. Things to look out for: LANG and LC_* variables (intro to locale on Linux) and JVM command line parameters affecting the locale. Other operating systems will likely also have some form of process inspection that you can use to show this information.

  2. Next up: Compile and run this on the particular host/JVM:

    import java.nio.charset.Charset;
    
    public class DumpCharset {
      public static void main(String[] args) {
        System.out.println(Charset.defaultCharset().displayName());
      }
    }
    

    As mentioned, if the processes are started in the same way, Charset.defaultCharset() should return the same value (on the same host). To get very close, you could even replace the application's jar containing the main method with a jar containing the above code temporarily (make sure the class names match).

  3. If that doesn't give you information you need (it should), try launching the process so it accepts a debugger, attach the debugger, and then drill down into the locale, and/or execute expressions similar to the above code.

  4. If that still doesn't give you the info you need, then you can go radical and use dynamic bytecode weaving at class-load time. This could be achieved with an existing AOP framework based on load time weaving (eg. AspectJ), or directly with ASM 4 and the java.lang.instrument API. Be aware that there are pitfalls to make this work, so it's hard to judge whether this is going to be reasonably straightforward or not in your case. But expect it to be (much?) more work than the above methods.

Upvotes: 3

Aubin
Aubin

Reputation: 14873

I suggest you to use System.getProperty( "os.name" ), System.getProperty( "os.arch" ) to identify the remote architecture.

The default Charset may be useful too:

java.nio.Charset cs = java.nio.Charset.defaultCharset();

Upvotes: 0

Related Questions