Rust serde get runtime heap size of Vec

Question

I'm making a rust tool which migrates data using the REST API of an internal service. Essentially, it makes a GET request, the returned data is an array of JSON objects which is deserialized into a struct field of type serde_json::Value, gets a mutable array of it (as_array_mut) for a bit of processing and POSTs the result to another REST API.

This is done in batches of say, 10000 records per request, however the data can unpredictably change in size. Usually it's around 10MiB but sometimes it can jump to over 400MiB which can easily crash the internal service.

Because of this, I want a way to control how many records can be fetched per request based on the size of the response data, in other words, the heap size of Vec during runtime. I've tried std::mem::size_of_val and the crate heapsize but they didn't work. I think one work around would be converting it to a string and get its length (the size doesn't have to be 100% accurate, a rough estimate is fine too) but that would mean there will be two copies of the JSON data. This is my last option but I wanted to know if there's any other alternative and efficient way to get the heap size.

Update - (response to @Caesar): I was temporarily using this while waiting for any better approaches: let size = serde_json::to_vec(docs)?.len();

Thanks to @Caesar I did a few benchmarking and here's what I got. size is is from what I mentioned above, size_new and size_new_for is from Caesar's answer, difference being the first one uses .map(|v| { sizeof_val(v) }).sum() and the second is a simple for-in loop which increments the result to a variable.

rows: 1000
size raw: 1360727, fmt: 1.30 MiB, took: 4.980794ms
size_new raw: 3834194, fmt: 3.66 MiB, took: 716.486µs
size_new_for raw: 3834194, fmt: 3.66 MiB, took: 672.523µs

rows: 10000
size raw: 17778816, fmt: 16.96 MiB, took: 62.151661ms
size_new raw: 43805986, fmt: 41.78 MiB, took: 8.775323ms
size_new_for raw: 43805986, fmt: 41.78 MiB, took: 8.158837ms

rows: 50000
size raw: 84354219, fmt: 80.45 MiB, took: 199.82163ms
size_new raw: 175919470, fmt: 167.77 MiB, took: 26.010926ms
size_new_for raw: 175919470, fmt: 167.77 MiB, took: 27.084353ms

Ignoring the timings, there seems to be a huge difference in size from turning the entire thing to a vector of bytes (serde_json::to_string takes over 2 times longer than serde_json::to_vec but gives the same result). I'm sort of confused as to which one is an over-estimate here, isn't turning the entire thing to a string/byte array supposed to be an over-estimate or have I been using a grossly under-estimated approximation this whole time?

Here's the complete code:

let size = serde_json::to_vec(docs)?.len() as u64;
let size_new: usize = docs.iter().map(|v| {
    sizeof_val(v)
}).sum();
let mut size_new_for = 0;
for v in docs.iter() {
    size_new_for += sizeof_val(v);
}

Rust serde get runtime heap size of Vec<serde_json::Value>

Answers (1)

Related Questions

Rust serde get runtime heap size of Vec&lt;serde_json::Value&gt;

Answers (1)

Related Questions

Rust serde get runtime heap size of Vec<serde_json::Value>