Reputation: 3358
I am trying to use serde together with bincode to de-serialize an arbitrary bitcoin network message. Given that the payload is handled ubiquitously as a byte array, how do I de-serialize it when the length is unknown at compile-time? bincode
does by default handle Vec<u8>
by assuming it's length is encoded as u64
right before the elements of the vector. However, this assumption does not hold here because the checksum comes after the length of the payload.
I have the following working solution
Cargo.toml
[package]
name = "serde-test"
version = "0.1.0"
edition = "2018"
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_bytes = "0.11"
bincode = "1.3.3"
main.rs
use bincode::Options;
use serde::{Deserialize, Deserializer, de::{SeqAccess, Visitor}};
#[derive(Debug)]
struct Message {
// https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
magic: u32,
command: [u8; 12],
length: u32,
checksum: u32,
payload: Vec<u8>,
}
struct MessageVisitor;
impl<'de> Visitor<'de> for MessageVisitor {
type Value = Message;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
formatter.write_str("Message")
}
fn visit_seq<V>(self, mut seq: V) -> Result<Self::Value, V::Error> where V: SeqAccess<'de>,
{
let magic = seq.next_element()?.unwrap();
let command = seq.next_element()?.unwrap();
let length: u32 = seq.next_element()?.unwrap();
let checksum = seq.next_element()?.unwrap();
let payload = (0..length).map(|_| seq.next_element::<u8>().unwrap().unwrap()).collect();
// verify payload checksum (omitted for brevity)
Ok(Message {magic, command, length, checksum, payload})
}
}
impl<'de> Deserialize<'de> for Message {
fn deserialize<D>(deserializer: D) -> Result<Message, D::Error> where D: Deserializer<'de>,
{
deserializer.deserialize_tuple(5000, MessageVisitor) // <-- overallocation
}
}
fn main() {
let bytes = b"\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00e\x00\x00\x00_\x1ai\xd2r\x11\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\xbc\x8f^T\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc6\x1bd\t \x8d\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xcb\x00q\xc0 \x8d\x12\x805\xcb\xc9yS\xf8\x0f/Satoshi:0.9.3/\xcf\x05\x05\x00\x01";
let msg: Message = bincode::DefaultOptions::new().with_fixint_encoding().deserialize(bytes).unwrap();
println!("{:?}", msg);
}
Output:
Message { magic: 3652501241, command: [118, 101, 114, 115, 105, 111, 110, 0, 0, 0, 0, 0], length: 101, checksum: 3530103391, payload: [114, 17, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 188, 143, 94, 84, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 198, 27, 100, 9, 32, 141, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 203, 0, 113, 192, 32, 141, 18, 128, 53, 203, 201, 121, 83, 248, 15, 47, 83, 97, 116, 111, 115, 104, 105, 58, 48, 46, 57, 46, 51, 47, 207, 5, 5, 0, 1] }
I dislike this solution because of how payload
is handled. It requires me to allocate some "large enough" buffer to take into account the dynamic size of the payload
, In the code snippet above 5000 is sufficient. I would much rather de-serialize payload
as a single element and use deserializer.deserialize_tuple(5, MessageVisitor)
instead.
Is there a way to handle this kind of deserialization in a succint manner?
Similar question I could find: Can I deserialize vectors with variable length prefix with Bincode?
Upvotes: 3
Views: 1557
Reputation: 58785
Your problem is that the source message is not encoded as bincode, so you are doing weird things to treat non-bincode data as if it was.
Serde is designed for creating serializers and deserializers for general-purpose formats, but your message is in a very specific format that can only be interpreted one way.
A library like nom is much more suitable for this kind of work, but it may be overkill considering how simple the format is and you can just parse it from the bytes directly:
use std::convert::TryInto;
fn main() {
let bytes = b"\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00e\x00\x00\x00_\x1ai\xd2r\x11\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\xbc\x8f^T\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc6\x1bd\t \x8d\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xcb\x00q\xc0 \x8d\x12\x805\xcb\xc9yS\xf8\x0f/Satoshi:0.9.3/\xcf\x05\x05\x00\x01";
let (magic_bytes, bytes) = bytes.split_at(4);
let magic = u32::from_le_bytes(magic_bytes.try_into().unwrap());
let (command_bytes, bytes) = bytes.split_at(12);
let command = command_bytes.try_into().unwrap();
let (length_bytes, bytes) = bytes.split_at(4);
let length = u32::from_le_bytes(length_bytes.try_into().unwrap());
let (checksum_bytes, bytes) = bytes.split_at(4);
let checksum = u32::from_le_bytes(checksum_bytes.try_into().unwrap());
let payload = bytes[..length as usize].to_vec();
let msg = Message {
magic,
command,
length,
checksum,
payload,
};
println!("{:?}", msg);
}
There are hundreds of cryptocurrency projects in Rust and there are many crates already written for handling cryptocurrency data structures. These crates are battle-tested and will have much better error-handling (my example above has none). As mentioned in the comments, you can perhaps look at the bitcoin crate.
Upvotes: 3