Reputation: 179
i have a serialized bin file of protobufs, written mainly in protobufs-net. i want to decompile it, and see the structure of it.
i used some toolds like :
https://protogen.marcgravell.com/decode
and i also used protoc
:
protoc --decode_raw < ~/Downloads/file.bin
and this is part of the result i get:
1 {
1: "4f81b7bb-d8bd-e911-9c1f-06ec640006bb"
2: 0x404105b1663ef93a
3: 0x4049c6158c593f36
4: 0x40400000
5 {
1: "53f8afde-04c6-e811-910e-4622e9d1766e"
2 {
1: "e993fba0-8fc9-e811-9c15-06ec640006bb"
}
2 {
1: "9a7c7210-3aca-e811-9c15-06ec640006bb"
2: 1
}
2 {
1: "2d7d12f1-2bc9-e811-9c15-06ec640006bb"
}
3: 18446744073709551615
}
6: 46
7: 1571059279000
}
how i can decompile it? i want to know the structure and change data in it and make a new bin file.
Upvotes: 1
Views: 583
Reputation: 1062820
Reverse engineering a .proto file is mostly a case of looking at the output of the tools such as you've mentioned, and trying to write a .proto that looks similar. Unfortunately, a number of concepts are ambiguous if you don't know the schema, as multiple different data types and shapes share the same encoding details, but... we can make guesses.
Looking at your output:
1 {
...
}
tells us that our root message probably has a sub-message at field 1; so:
message Root {
repeated Foo Foos = 1;
}
(I'm guessing at the repeated
here; if the 1
only appears once, it could be single)
with everything at the next level being our Foo
.
1: "4f81b7bb-d8bd-e911-9c1f-06ec640006bb"
2: 0x404105b1663ef93a
3: 0x4049c6158c593f36
4: 0x40400000
5: { ... }
6: 46,
7: 1571059279000
this looks like it could be
message Foo {
string A = 1;
sfixed64 B = 2;
sfixed64 C = 3;
sfixed32 D = 4;
repeated Bar E = 5; // again, might not be "repeated" - see how many times it occurs
int64 F = 6;
int64 G = 7;
}
however; those sfixed64
could be double
, or fixed64
; and those sfixed32
could be fixed32
or float
; likewise, the int64
could be sint64
or uint64
- or int32
, sint32
, uint32
or bool
, and I wouldn't be able to tell (they are all just "varint"). Each option gives a different meaning to the value!
our Bar
definitely has some kind of repeated
, because of all the 2
:
1: "53f8afde-04c6-e811-910e-4622e9d1766e"
2 { ... }
2 { ... }
2 { ... }
3: 18446744073709551615
let's guess at:
message Bar {
string A = 1;
repeated Blap B = 2;
int64 C = 3;
}
and finally, looking at the 2
from the previous bit, we have:
1: "e993fba0-8fc9-e811-9c15-06ec640006bb"
and
1: "9a7c7210-3aca-e811-9c15-06ec640006bb"
2: 1
and
1: "2d7d12f1-2bc9-e811-9c15-06ec640006bb"
so combining those, we might guess:
message Blap {
string A = 1;
int64 B = 2;
}
Depending on whether you have more data, there may be additional fields, or you may be able to infer more context. For example, if an int64
value such as Blap.B
is always 1
or omitted, it might actually be a bool
. If one of the repeated
elements always has at most one value, it might not be repeated
.
The trick is to to play with it until you can deserialize the data, re-serialize it, and get the exact same payload (i.e. round-trip).
Once you have that: you'll want to deserialize it, mutate the thing you wanted to change, and serialize.
Upvotes: 3