what is the binary-to-text encoding used by protoc --decode?

Question

I am looking at the output of the protoc --decode command and I cannot fathom the encoding used when it encounters bytes :

data {
  image: "\377\330\377\340\000\020JFIF\000\001[…]\242\2634G\377\331"
}

The […] was added by me to shorten the output.

What encoding is this?

Edit

So based on Bruce's answer I wrote my own utility in order to generate sample data from a shell script :

public static void main(String[] parameters) throws IOException {
    File binaryInput = new File(parameters[0]);
    System.out.println("\""+TextFormat.escapeBytes(ByteString.readFrom(new FileInputStream(binaryInput)))+"\"");
}

}

that way I can call serialize my binaries and insert them in a text serialization of a protobuf before calling protoc --encode on it :

IMAGE=$(mktemp)
OUTPUT=$(mktemp)
BIN_INSTANCE=$(mktemp)

echo -n 'capture: ' > $IMAGE
java -cp "$HOME/.m2/repository/com/google/protobuf/protobuf-java/3.0.0/protobuf-java-3.0.0.jar:target/protobuf-generator-1.0.0-SNAPSHOT.jar" protobuf.BinarySerializer image.jpg >> $IMAGE
sed -e 's/{UUID}/'$(uuidgen)'/' template.protobuf > $OUTPUT
sed -i '/{IMAGE}/ {
    r '$IMAGE'
    d
    }' $OUTPUT
cat $OUTPUT | protoc --encode=prototypesEvent.proto> $BIN_INSTANCE

with template.protobuf being :

uuid: "{UUID}"
image {
    capture: "{IMAGE}"
}

Bruce Martin · Accepted Answer

I am presuming it is the same as produced by java.

basically:

between space (0x20) and tilde (0x7e) treat it as an ascii character
if there is a shortcut (e.g. , , \ etc) use the shortcut
otherwise escape the character (octal)

so in the above \377 is 1 byte: 377 octal or 255 in decimal.

"\377\330\377\340 = 255 216 255 224

You should be able to copy the string into a Java/C program and convert it to bytes

The Java code looks to be:

  static String escapeBytes(final ByteSequence input) {
    final StringBuilder builder = new StringBuilder(input.size());
    for (int i = 0; i < input.size(); i++) {
      final byte b = input.byteAt(i);
      switch (b) {
        // Java does not recognize \a or \v, apparently.
        case 0x07: builder.append("\a"); break;
        case '\b': builder.append("\b"); break;
        case '\f': builder.append("\f"); break;
        case '
': builder.append("\n"); break;
        case '
': builder.append("\r"); break;
        case '	': builder.append("\t"); break;
        case 0x0b: builder.append("\v"); break;
        case '\': builder.append("\\"); break;
        case '\'': builder.append("\\'"); break;
        case '"' : builder.append("\\""); break;
        default:
          // Only ASCII characters between 0x20 (space) and 0x7e (tilde) are
          // printable.  Other byte values must be escaped.
          if (b >= 0x20 && b <= 0x7e) {
            builder.append((char) b);
          } else {
            builder.append('\');
            builder.append((char) ('0' + ((b >>> 6) & 3)));
            builder.append((char) ('0' + ((b >>> 3) & 7)));
            builder.append((char) ('0' + (b & 7)));
          }
          break;
      }
    }
    return builder.toString();
  }

taken from com.google.protobuf.TextFormatEscaper

what is the binary-to-text encoding used by protoc --decode?

Answers (1)

Related Questions