Timmmm
Timmmm

Reputation: 97048

Lazily loading large strings from a JSON file in Rust

I have a large JSON file containing a medium number of very large strings and I don't want to store it all in memory at once.

Is it possible to use serde-json (or similar) to load it, except for these large strings. For the large strings I want it to load them lazily. For example it could store the file offset and length instead of the actual string, and then provide a function to actually read the string.

Upvotes: -1

Views: 79

Answers (1)

啊鹿Dizzyi
啊鹿Dizzyi

Reputation: 1048

You will need a custom parse for storing the parsing data lazily, but

use serde_json::from_reader(_) help a bit, along with std::io::BufReader

Edit

change print_mem_stat() to Peak allocation.

Rough Experiment

use std::io::{BufReader, BufWriter, Read};

use peak_alloc::PeakAlloc;

#[global_allocator]
static PEAK_ALLOC: PeakAlloc = PeakAlloc;

fn print_mem_stat() {
    let current_mem = PEAK_ALLOC.current_usage_as_mb();
    println!("This program currently uses {} MB of RAM.", current_mem);
    let peak_mem = PEAK_ALLOC.peak_usage_as_gb();
    println!("The max amount that was used {}", peak_mem);
    println!();
}

fn main() {
    println!("start");
    print_mem_stat();
    let path = std::env::current_dir().unwrap().join("data.json");
    
    {
        let data = (0..(1<<26)).collect::<Vec<_>>();
        let f = std::fs::OpenOptions::new().write(true).truncate(true).create(true).open(&path).unwrap();
        let wtr = BufWriter::new(f);
        serde_json::to_writer_pretty(wtr, &data).unwrap();
        println!("writing data");
        print_mem_stat();
    }
    
    // parse form a BufReader;
    {
        let f = std::fs::OpenOptions::new().read(true).open(&path).unwrap();
        let rdr = BufReader::new(f);
        let v: Vec<i32> = serde_json::from_reader(rdr).unwrap();
        println!("reading data from reader");
        print_mem_stat();
    }

    // read to a string before parsing
    {
        let f = std::fs::OpenOptions::new().read(true).open(&path).unwrap();
        let mut rdr = BufReader::new(f);
        let mut content = String::new();
        rdr.read_to_string(&mut content).unwrap();
        let v: Vec<i32> = serde_json::from_str(&content).unwrap();
        println!("reading data as from string");
        print_mem_stat();
    }

    std::fs::remove_file(path).unwrap();
}

Output

start
This program currently uses 0.0010881424 MB of RAM.
The max amount that was used 0.0000010626391

writing data
This program currently uses 256.00116 MB of RAM.
The max amount that was used 0.25000876

reading data from reader
This program currently uses 256.00116 MB of RAM.
The max amount that was used 0.37500876

reading data as from string
This program currently uses 1013.4126 MB of RAM.
The max amount that was used 1.1146607

Upvotes: -2

Related Questions