MistyD
MistyD

Reputation: 17233

Best practice for arranging a vector of objects

I know that searching a sorted vector is far faster than than searching a non sorted vector. This is understandable when the vector stores strings. My question is suppose that a vector stores objects or pointer to objects of a class say person. This class has two properties say a SSN and an age. There are already two predicates available for the vector for (std::find_if) one that searches for the SSN(string) and one that searches for an age(int). My question is what is the best practice of sorting such a vector.

Upvotes: 0

Views: 230

Answers (4)

leander
leander

Reputation: 8737

One of the disadvantages to moving to a "heavier" multi-key data structure is (like those provided by Boost.MultiIndex) that you may lose some locality and performance, depending on your usage.

Consider the number of elements of your container and your access patterns.

If you're building a container and then never modifying it later, but doing lots of lookups, you may find that simply creating and populating the vector, making a copy, and sorting the two copies different ways may be what you want. (If you want to avoid the overhead of duplicate full copies of the data, and don't mind a single extra level of indirection, you could consider something like having your containers contain shared_ptr.)

If you tend to do a large number of Age queries in one lump, then switch to SSN queries, maybe sorting one way, doing the queries, sorting the other way, then doing the other queries would be just fine -- again, it depends on the number of queries between sorts.

If your data structure is small enough (tens of items or less), you may find a linear search for one of your types of lookup is just fine, and you can sort the vector for the other type of lookup -- especially if you tend to favor one type of lookup.

You might also consider splitting your Person into pieces -- a bit like database normalization -- and have one container that "just" contains SSNs and some sort of handle or PersonKey, another than contains just Ages and the key, and a final container that contains the key and the rest. This helps keep your searches very local (see "array of structs" versus "struct of arrays").

Each of these may increase code complexity and maintenance costs, so the usual "your mileage may vary" statement applies. You're probably going to be making tradeoffs with any of these solutions.

Upvotes: 0

Lajos Arpad
Lajos Arpad

Reputation: 76817

When you want to sort something, the best practice is to first ask yourself: what function should determine whether a < b?

Define the function and use that for your sorting.

Upvotes: 1

Benjamin Lindley
Benjamin Lindley

Reputation: 103733

There is no best practice here. If you want to search for the objects by SSN, then sort by SSN. If you want to search for the objects by age, then sort by age. If you want to search by both (alternatively), then don't use a vector. Use something from Boost.MultiIndex.

By the way, searching is only faster on a sorted vector if you use a binary search (lower_bound, upper_bound or equal_range), not a linear search, which is what find_if does.

Upvotes: 6

user1708860
user1708860

Reputation: 1753

That depends on your usage of the vector, if you have to make searches according to the age, then use the age, if it's the SSN, then use the SSN.

Altough, if you use the SSN (why not an integer?) the best practice is probably to use std::unordered_map.

That is because the SSN is unique.

Upvotes: 2

Related Questions