Reputation: 3770
Say I have a vector of items where each item has an id, like in the example below. I can of course get all the items in the vector with a given id using something like large_vector.iter().filter(|item| item.id == given_id)
. However, for improved performance I can do some preprocessing and sort the vector by item id and store the bounds for each id, like in the example below. This way I can quickly access a slice of the vector for any given id. I end up doing this alot but feel like I am reinventing the wheel and needlessly opening myself up to bugs. Is there a better way to do this directly, preferably using the standard library else some other library?
use std::{collections::HashMap, ops::Range};
#[derive(Debug)]
struct Item {
id: String,
val: f64,
}
impl Item {
fn new(id: &str, val: f64) -> Item {
Item { id: id.into(), val }
}
}
fn main() {
let mut large_vector = vec![
Item::new("C", 2.21),
Item::new("A", 34.2),
Item::new("B", 23.54),
Item::new("C", 34.34),
Item::new("C", 45.21),
Item::new("B", 21.34),
];
// first sort by id
large_vector.sort_by(|item1, item2| item1.id.cmp(&item2.id));
dbg!(&large_vector);
// now create a HasMap storing bounds for each id
let mut lookup = HashMap::new();
let mut start: usize = 0;
let mut end: usize = 0;
if let Some(first_item) = large_vector.get(0) {
let mut current_id = first_item.id.clone();
// insert bound if entered new id section or is last item
for item in &large_vector {
if current_id != item.id {
lookup.insert(current_id.clone(), Range { start, end });
current_id = item.id.clone();
start = end;
}
end += 1;
}
lookup.insert(current_id.clone(), Range { start, end });
}
// test by getting the items for a given id
dbg!(&lookup);
let range = lookup.get("C").unwrap();
dbg!(range);
let items = large_vector[range.start..range.end]
.iter()
.collect::<Vec<_>>();
dbg!(items);
}
[src/main.rs:26] &large_vector = [
Item {
id: "A",
val: 34.2,
},
Item {
id: "B",
val: 23.54,
},
Item {
id: "B",
val: 21.34,
},
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]
[src/main.rs:47] &lookup = {
"A": 0..1,
"B": 1..3,
"C": 3..6,
}
[src/main.rs:49] range = 3..6
[src/main.rs:53] items = [
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]
Upvotes: 0
Views: 474
Reputation: 22501
Assuming that your items have to be in a vector, and you can only sort them, I can think of two possibilities:
log(n)
based divide-and-conquer search to get the range. If you are interested in what I mean with that, I can provide you with some code.But in general, I think a vector is simply the wrong data structure. I'd try to change that first.
Upvotes: 1