Simon Cozens
Simon Cozens

Reputation: 765

serde: Stateful deserialisation at top level

I'm trying to deserialise a binary format (OpenType) which consists of data in multiple tables (binary structs). I would like to be able to deserialise the tables independently (because of how they're stored in the top-level file structure; imagine them being in separate files, so they have to be deserialised separately), but sometimes there are dependencies between them.

A simple example is the loca table which contains an array of either 16-bit or 32-bit offsets, depending on the value of the indexToLocFormat field in the head table. As a more complex example, these loca table offsets in turn are used as offsets into the binary data of the glyf table to locate elements. So I need to get indexToLocFormat and loca: Vec<32> "into" the serializer somehow.

Obviously I need to implement Deserialize myself and write visitors, and I've got my head around doing that. When there are dependencies from a table to a subtable, I've also been able to work that out using deserialize_seed inside the table's visitor. But I don't know how to apply that to pass in information between tables.

I think I need to store what is essentially configuration information (value of indexToLocFormat, array of offsets) when constructing my serializer object:

pub struct Deserializer<'de> {
    input: &'de [u8],
    ptr: usize,
    locaShortVersion: Option<bool>,
    glyfOffsets: Option<Vec<u32>>, 
    ...
}

The problem is that I don't know how to retrieve that information when I'm inside the Visitor impl for the struct; I don't know how to get at the deserializer object at all, let alone how to type things so that I get at my Deserializer object with the configuration fields, not just a generic serde::de::Deserializer:

impl<'de> Visitor<'de> for LocaVisitor {
    type Value = Vec<u32>;

    fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(formatter, "A loca table")
    }
    fn visit_seq<A: SeqAccess<'de>>(self, mut seq: A) -> Result<Self::Value, A::Error> {
            let locaShortVersion = /* what goes here? */;
            if locaShortVersion {
                  Ok(seq.next_element::Vec<u16>()?
            .ok_or_else(|| serde::de::Error::custom("Oops"))?
            .map { |x| x as u32 }
            } else {
                  Ok(seq.next_element::Vec<u32>()?
            .ok_or_else(|| serde::de::Error::custom("Oops"))?
            }
    }
}

(terrible code here; if you're wondering why I'm writing Yet Another OpenType Parsing Crate, it's because I want to both read and write font files.)

Upvotes: 3

Views: 383

Answers (1)

Simon Cozens
Simon Cozens

Reputation: 765

Actually, I think I've got it. The trick is to do the deserialization in stages. Rather than calling the deserializer module's from_bytes function (which wraps the struct creation, and T::deserialize call), do this instead:

use serde::de::DeserializeSeed; // Having this trait in scope is also key
let mut de = Deserializer::from_bytes(&binary_loca_table);
let ssd: SomeSpecialistDeserializer { ... configuration goes here .. };
let loca_table: Vec<u32> = ssd.deserialize(&mut de).unwrap();

In this case, I use a LocaDeserializer defined like so:

pub struct LocaDeserializer { locaIs32Bit: bool }

impl<'de> DeserializeSeed<'de> for LocaDeserializer {
    type Value = Vec<u32>;

    fn deserialize<D>(self, deserializer: D) -> std::result::Result<Self::Value, D::Error>
    where
        D: serde::de::Deserializer<'de>,
    {
        struct LocaDeserializerVisitor {
            locaIs32Bit: bool,
        }

        impl<'de> Visitor<'de> for LocaDeserializerVisitor {
            type Value = Vec<u32>;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                write!(formatter, "a loca table")
            }

            fn visit_seq<A>(self, mut seq: A) -> std::result::Result<Vec<u32>, A::Error>
            where
                A: SeqAccess<'de>,
            {
                if self.locaIs32Bit {
                    Ok(seq.next_element::<u32>()?.ok_or_else(|| serde::de::Error::custom(format!("Expecting a 32 bit glyph offset")))?)
                } else {
                    Ok(seq.next_element::<u16>()?.ok_or_else(|| serde::de::Error::custom(format!("Expecting a 16 bit glyph offset")))?
                        .iter()
                        .map(|x| (*x as u32) * 2)
                        .collect())
                }
            }
        }

        deserializer.deserialize_seq(LocaDeserializerVisitor {
            locaIs32Bit: self.locaIs32Bit,
        })
    }
}

And now:

    fn loca_de() {
        let binary_loca = vec![
            0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1a,
        ];
        let mut de = Deserializer::from_bytes(&binary_loca);
        let cs: loca::LocaDeserializer = loca::LocaDeserializer { locaIs32Bit: false };
        let floca: Vec<u32> = cs.deserialize(&mut de).unwrap();
        println!("{:?}", floca);
        // [2, 0, 2, 0, 0, 52]

        let mut de = Deserializer::from_bytes(&binary_loca);
        let cs: loca::LocaDeserializer = loca::LocaDeserializer { locaIs32Bit: true };
        let floca: Vec<u32> = cs.deserialize(&mut de).unwrap();
        println!("{:?}", floca);
        // [65536, 65536, 26]
}

Serde is very nice - once you have got your head around it.

Upvotes: 0

Related Questions