serde: speedup custom enum deserialization

Question

My program parse big enough json document (30MB), on machine with slow CPU it takes 70ms, I want to speedup the process, and I find out that 27% of parsing take place into my foo_document_type_deserialize, is it possible to improve this function, may be there is way to skip String allocation here: let s = String::deserialize(deserializer)?;?

I completly sure that strings that represent enum values doesn't contain special json characters like \b \f " \, so it should be safe to work with unescaped string.

use serde::{Deserialize, Deserializer};

#[derive(Deserialize, Debug, Clone)]
#[serde(rename_all = "camelCase")]
pub struct FooDocument {
    // other fields...
    #[serde(rename = "type")]
    #[serde(deserialize_with = "foo_document_type_deserialize")]
    doc_type: FooDocumentType,
}

fn foo_document_type_deserialize<'de, D>(deserializer: D) -> Result
where
    D: Deserializer<'de>,
{
    use self::FooDocumentType::*;
    let s = String::deserialize(deserializer)?;
    match s.as_str() {
        "tir lim bom bom" => Ok(Var1),
        "hgga;hghau" => Ok(Var2),
        "hgueoqtyhit4t" => Ok(Var3),
        "Text" | "Type not detected" | "---" => Ok(Unknown),
        _ => Err(serde::de::Error::custom(format!(
            "Unsupported foo document type '{}'",
            s
        ))),
    }
}

#[derive(Debug, Clone, Copy)]
pub enum FooDocumentType {
    Unknown,
    Var1,
    Var2,
    Var3,
}

dtolnay · Accepted Answer

The custom impl you've written is in a form that serde_derive can generate:

#[derive(Deserialize, Debug)]
pub enum FooDocumentType {
    #[serde(rename = "Text", alias = "Type not detected", alias = "---")]
    Unknown,
    #[serde(rename = "tir lim bom bom")]
    Var1,
    #[serde(rename = "hgga;hghau")]
    Var2,
    #[serde(rename = "hgueoqtyhit4t")]
    Var3,
}

The resulting derived code does not allocate memory and is about 2× faster in a quick microbenchmark compared to your code when I measure the following:

serde_json::from_str::(r#"{"type":"hgga;hghau"}"#).unwrap()

serde: speedup custom enum deserialization

Answers (1)

Related Questions