Max888
Max888

Reputation: 3770

Efficiently get all items of a vector with a given id

Say I have a vector of items where each item has an id, like in the example below. I can of course get all the items in the vector with a given id using something like large_vector.iter().filter(|item| item.id == given_id). However, for improved performance I can do some preprocessing and sort the vector by item id and store the bounds for each id, like in the example below. This way I can quickly access a slice of the vector for any given id. I end up doing this alot but feel like I am reinventing the wheel and needlessly opening myself up to bugs. Is there a better way to do this directly, preferably using the standard library else some other library?

use std::{collections::HashMap, ops::Range};

#[derive(Debug)]
struct Item {
    id: String,
    val: f64,
}
impl Item {
    fn new(id: &str, val: f64) -> Item {
        Item { id: id.into(), val }
    }
}

fn main() {
    let mut large_vector = vec![
        Item::new("C", 2.21),
        Item::new("A", 34.2),
        Item::new("B", 23.54),
        Item::new("C", 34.34),
        Item::new("C", 45.21),
        Item::new("B", 21.34),
    ];

    // first sort by id
    large_vector.sort_by(|item1, item2| item1.id.cmp(&item2.id));
    dbg!(&large_vector);

    // now create a HasMap storing bounds for each id
    let mut lookup = HashMap::new();
    let mut start: usize = 0;
    let mut end: usize = 0;
    if let Some(first_item) = large_vector.get(0) {
        let mut current_id = first_item.id.clone();
        // insert bound if entered new id section or is last item
        for item in &large_vector {
            if current_id != item.id {
                lookup.insert(current_id.clone(), Range { start, end });
                current_id = item.id.clone();
                start = end;
            }
            end += 1;
        }
        lookup.insert(current_id.clone(), Range { start, end });
    }

    // test by getting the items for a given id
    dbg!(&lookup);
    let range = lookup.get("C").unwrap();
    dbg!(range);
    let items = large_vector[range.start..range.end]
        .iter()
        .collect::<Vec<_>>();
    dbg!(items);
}
[src/main.rs:26] &large_vector = [
    Item {
        id: "A",
        val: 34.2,
    },
    Item {
        id: "B",
        val: 23.54,
    },
    Item {
        id: "B",
        val: 21.34,
    },
    Item {
        id: "C",
        val: 2.21,
    },
    Item {
        id: "C",
        val: 34.34,
    },
    Item {
        id: "C",
        val: 45.21,
    },
]
[src/main.rs:47] &lookup = {
    "A": 0..1,
    "B": 1..3,
    "C": 3..6,
}
[src/main.rs:49] range = 3..6
[src/main.rs:53] items = [
    Item {
        id: "C",
        val: 2.21,
    },
    Item {
        id: "C",
        val: 34.34,
    },
    Item {
        id: "C",
        val: 45.21,
    },
]

Upvotes: 0

Views: 474

Answers (1)

Finomnis
Finomnis

Reputation: 22501

Assuming that your items have to be in a vector, and you can only sort them, I can think of two possibilities:

  • The solution you proposed. It should be the fastest one for lookup, but has the drawback that the lookup tables get completely invalidated every time you insert/remove an item.
  • Keep the vector sorted and perform a log(n) based divide-and-conquer search to get the range. If you are interested in what I mean with that, I can provide you with some code.

But in general, I think a vector is simply the wrong data structure. I'd try to change that first.

Upvotes: 1

Related Questions