Gnat
Gnat

Reputation: 2921

How much instruction-level optimisation can a JIT apply?

To what extent can a JIT replace platform independent code with processor-specific machine instructions?

For example, the x86 instruction set includes the BSWAP instruction to reverse a 32-bit integer's byte order. In Java the Integer.reverseBytes() method is implemented using multiple bitwise masks and shifts, even though in x86 native code it could be implemented in a single instruction using BSWAP. Are JITs (or static compilers for that matter) able to make the change automatically or is it too complex or not worth it due to a poor speed/time tradeoff?

(I know that this is in most cases a micro-optimisation, but I'm interested none the less.)

Upvotes: 1

Views: 249

Answers (1)

Kuai Wei
Kuai Wei

Reputation: 86

For this case, yes, the hotspot server compiler could do this optimization. The reverseBytes() methods are registered as vmIntrinsics in hotspot. When jit compiler compile these methods, it will generate a special IR node, not compile the whole method. And this node will be translated into 'bswap' in x86. see src/share/vm/opto/library_call.cpp

//----------------------------   inline_reverseBytes_int/long/char/short-------------------
// inline Integer.reverseBytes(int)
// inline Long.reverseBytes(long)
// inline Character.reverseBytes(char)
// inline Short.reverseBytes(short)
bool LibraryCallKit::inline_reverseBytes(vmIntrinsics::ID id) {
  assert(id == vmIntrinsics::_reverseBytes_i || id == vmIntrinsics::_reverseBytes_l ||
         id == vmIntrinsics::_reverseBytes_c || id == vmIntrinsics::_reverseBytes_s,
         "not reverse Bytes");
  if (id == vmIntrinsics::_reverseBytes_i && !Matcher::has_match_rule(Op_ReverseBytesI))  return false;
  if (id == vmIntrinsics::_reverseBytes_l && !Matcher::has_match_rule(Op_ReverseBytesL))  return false;
  if (id == vmIntrinsics::_reverseBytes_c && !Matcher::has_match_rule(Op_ReverseBytesUS)) return false;
  if (id == vmIntrinsics::_reverseBytes_s && !Matcher::has_match_rule(Op_ReverseBytesS))  return false;
  _sp += arg_size();        // restore stack pointer
  switch (id) {
  case vmIntrinsics::_reverseBytes_i:
    push(_gvn.transform(new (C, 2) ReverseBytesINode(0, pop())));
    break;
  case vmIntrinsics::_reverseBytes_l:
    push_pair(_gvn.transform(new (C, 2) ReverseBytesLNode(0,pop_pair())));
    break;
  case vmIntrinsics::_reverseBytes_c:
    push(_gvn.transform(new (C, 2) ReverseBytesUSNode(0, pop())));
    break;
  case vmIntrinsics::_reverseBytes_s:
    push(_gvn.transform(new (C, 2) ReverseBytesSNode(0, pop())));
    break;
  default:
;
  }
  return true;
}

and src/cpu/x86/vm/x86_64.ad

instruct bytes_reverse_int(rRegI dst) %{
  match(Set dst (ReverseBytesI dst));

  format %{ "bswapl  $dst" %}
  opcode(0x0F, 0xC8);  /*Opcode 0F /C8 */
  ins_incode( REX_reg(dst), OpcP, opc2_reg(dst) );
  ins_pipe( ialu_reg );
%}

Upvotes: 1

Related Questions