user1244932
user1244932

Reputation: 8112

serde: speedup custom enum deserialization

My program parse big enough json document (30MB), on machine with slow CPU it takes 70ms, I want to speedup the process, and I find out that 27% of parsing take place into my foo_document_type_deserialize, is it possible to improve this function, may be there is way to skip String allocation here: let s = String::deserialize(deserializer)?;?

I completly sure that strings that represent enum values doesn't contain special json characters like \b \f \n \r \t \" \\, so it should be safe to work with unescaped string.

use serde::{Deserialize, Deserializer};

#[derive(Deserialize, Debug, Clone)]
#[serde(rename_all = "camelCase")]
pub struct FooDocument {
    // other fields...
    #[serde(rename = "type")]
    #[serde(deserialize_with = "foo_document_type_deserialize")]
    doc_type: FooDocumentType,
}

fn foo_document_type_deserialize<'de, D>(deserializer: D) -> Result<FooDocumentType, D::Error>
where
    D: Deserializer<'de>,
{
    use self::FooDocumentType::*;
    let s = String::deserialize(deserializer)?;
    match s.as_str() {
        "tir lim bom bom" => Ok(Var1),
        "hgga;hghau" => Ok(Var2),
        "hgueoqtyhit4t" => Ok(Var3),
        "Text" | "Type not detected" | "---" => Ok(Unknown),
        _ => Err(serde::de::Error::custom(format!(
            "Unsupported foo document type '{}'",
            s
        ))),
    }
}

#[derive(Debug, Clone, Copy)]
pub enum FooDocumentType {
    Unknown,
    Var1,
    Var2,
    Var3,
}

Upvotes: 2

Views: 1294

Answers (1)

dtolnay
dtolnay

Reputation: 11013

The custom impl you've written is in a form that serde_derive can generate:

#[derive(Deserialize, Debug)]
pub enum FooDocumentType {
    #[serde(rename = "Text", alias = "Type not detected", alias = "---")]
    Unknown,
    #[serde(rename = "tir lim bom bom")]
    Var1,
    #[serde(rename = "hgga;hghau")]
    Var2,
    #[serde(rename = "hgueoqtyhit4t")]
    Var3,
}

The resulting derived code does not allocate memory and is about 2× faster in a quick microbenchmark compared to your code when I measure the following:

serde_json::from_str::<FooDocument>(r#"{"type":"hgga;hghau"}"#).unwrap()

Upvotes: 6

Related Questions