user8024280
user8024280

Reputation:

Sorting a container multiple times, what container and what aproach to use

I have some data which I need to print, for simplicity lets say it is a container (vector) of people which have some parameters. In different parts of my program, I need to print all of them sorted by different parameter. My question are

1.) which container to choose? (personally I chose vector).

2.) What approach is better, sort whole vector every time or it is better to make a copy of that vector and save it sorted? In my solution I sorted the same vector every time, but maybe it is not a correct approach with huge vectors because of speed.

class Person
{
private:
    std::string name;
    std::string surname;
    int age;
public:
    Person(std::string name, std::string surname, int age) : name{ name }, surname{ surname }, age{ age } {};
    void print() { std::cout << name << " " << surname << " " << age << std::endl; };
    static bool sortName(Person const &A, Person const &B) { return A.name < B.name; };
    static bool sortSurname(Person const &A, Person const &B) { return A.surname < B.surname; };
    static bool sortAge(Person const &A, Person const &B) { return A.age < B.age; };
};

main:

int main()
{
    std::vector<Person> persons;
    Person person1("John", "Smith", 30);
    Person person2("Mark", "Cooper", 28);
    Person person3("George", "Orwell", 19);

    persons.push_back(person1);
    persons.push_back(person2);
    persons.push_back(person3);

    std::sort(persons.begin(), persons.end(), Person::sortSurname);
    for (int i = 0; i < persons.size(); ++i)
    {
        persons[i].print();
    }

    // do some other stuff here ... and then ...
    std::sort(persons.begin(), persons.end(), Person::sortName);
    for (int i = 0; i < persons.size(); ++i)
    {
        persons[i].print();
    }

    // do some other stuff here ... and then ...
    std::sort(persons.begin(), persons.end(), Person::sortAge);
    for (int i = 0; i < persons.size(); ++i)
    {
        persons[i].print();
    }

    return 0;
}

Upvotes: 4

Views: 399

Answers (7)

ks1322
ks1322

Reputation: 35708

I would use 3 std::set instances of std::shared_ptr<Person> each sorted by corresponding field of Person:

int main()
{
    std::shared_ptr<Person> person1 = std::make_shared<Person>("John", "Smith", 30);
    std::shared_ptr<Person> person2 = std::make_shared<Person>("Mark", "Cooper", 28);
    std::shared_ptr<Person> person3 = std::make_shared<Person>("George", "Orwell", 19);

    std::set<std::shared_ptr<Person>> persons1([](std::shared_ptr<Person> a, std::shared_ptr<Person> b) {
        return a->name < b->name;
    });
    std::set<std::shared_ptr<Person>> persons2([](std::shared_ptr<Person> a, std::shared_ptr<Person> b) {
        return a->surname < b->surname;
    });
    std::set<std::shared_ptr<Person>> persons3([](std::shared_ptr<Person> a, std::shared_ptr<Person> b) {
        return a->age < b->age;
    });

    persons1.insert(person1);
    persons1.insert(person2);
    persons1.insert(person3);

    persons2.insert(person1);
    persons2.insert(person2);
    persons2.insert(person3);

    persons3.insert(person1);
    persons3.insert(person2);
    persons3.insert(person3);

    return 0;
}
  • By using std::shared_ptr you will not waste memory when storing objects in several containers.
  • std::set is already sorted container, so that you have not to sort it every time when using it, just enumerate elements from begin to end.

Upvotes: 2

EvilTeach
EvilTeach

Reputation: 28837

Consider replacing your vector of Person, by vector of pointer to Person. With that in place, it is pretty cheap to swap two Persons just by swapping the pointers. Then use the functors defined in your class to put the pointers in the desired sort order, and start printing.

Upvotes: 2

Richard Hodges
Richard Hodges

Reputation: 69854

boost::multi_index_container allows you to define a container of any type with any number of different indexes, or views.

The container keeps the indexes up to date automatically upon insertion and removal.

It's a massive template library which takes a little time to get used to, but the documentation is good, with plenty of examples.

Here's an implementation expressed this way:

#include <iostream>
#include <string>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/mem_fun.hpp>

class Person {
private:
    std::string name;
    std::string surname;
    int age;
public:
    Person(std::string name, std::string surname, int age) : name{name}, surname{surname}, age{age} {};

    auto get_name() const -> const std::string& { return name; }
    auto get_surname() const -> const std::string& { return surname; }
    auto get_age() const -> int { return age; }

    void print() const { std::cout << name << " " << surname << " " << age << std::endl; };
};

namespace bmi = boost::multi_index;

struct by_name {};
struct by_surname {};
struct by_age;
using PersonTable = boost::multi_index_container<Person,
        bmi::indexed_by<
                bmi::ordered_non_unique<bmi::tag<by_name>, bmi::const_mem_fun<Person,std::string const&,&Person::get_name>>,
                bmi::ordered_non_unique<bmi::tag<by_surname>, bmi::const_mem_fun<Person,std::string const&,&Person::get_surname>>,
                bmi::ordered_non_unique<bmi::tag<by_age>, bmi::const_mem_fun<Person,int,&Person::get_age>>
        >
>;

int main()
{
    PersonTable people;
    people.insert(Person("John", "Smith", 30));
    people.insert(Person("Mark", "Cooper", 28));
    people.insert(Person("George", "Orwell", 19));

    std::cout << "by name" << std::endl;
    for (auto&& person : people.get<by_name>())
    {
        person.print();
    }
    std::cout << "\nby surname" << std::endl;
    for (auto&& person : people.get<by_surname>())
    {
        person.print();
    }
    std::cout << "\nby age" << std::endl;
    for (auto&& person : people.get<by_age>())
    {
        person.print();
    }
}

expected output:

by name
George Orwell 19
John Smith 30
Mark Cooper 28

by surname
Mark Cooper 28
George Orwell 19
John Smith 30

by age
George Orwell 19
Mark Cooper 28
John Smith 30

documentation here: http://www.boost.org/doc/libs/1_64_0/libs/multi_index/doc/index.html

Upvotes: 5

Jesper Juhl
Jesper Juhl

Reputation: 31447

If the vector is small or the elements are cheap to copy you can probably just re-sort it when needed without any problems.

If the elements of the vector are large and expensive to copy you can sort the vector once in one of the ways you need and then create a second vector of std::reference_wrappers and sort that a different way, to create a second "view" of the original vector that does not modify the original nor copy the elements to the second vector.

As for container choice; just go with std::vector unless you specifically need some special property of one of the other containers.

In any case, benchmark different solutions (with a optimized build) and measure the performance of different solutions before setteling on one.

Upvotes: 1

user7860670
user7860670

Reputation: 37468

Instead of sorting vector of objects (which is rather is expensive for complex objects that have many fields) you should build several vectors of indexes to objects stored in main vector and sort them by various criteria.

#include <algorithm>
...

::std::vector< Person > persons;
//  add persons...

::std::vector< ::std::size_t > sorted_indexes;
sorted_indexes.reserve(persons.size());
{
    ::std::size_t index{};
    ::std::generate
    (
        sorted_indexes.begin()
    ,   sorted_indexes.end()
    ,   [&index]{return index++;}
    );
}
::std::sort
(
    sorted_indexes.begin()
,   sorted_indexes.end()
,   [&persons](::std::size_t const left, ::std::size_t const right)
    {
        return Person::sortSurname(persons[left], persons[right]);
    }
);
for(auto person_index: sorted_indexes)
{
    persons[person_index].print();
}

Also sortSurname should take a const reference to avoid copying:

static bool sortSurname(Person const & left, Person const & right) { return left.surname < right.surname; };

Upvotes: 1

Zakir
Zakir

Reputation: 2382

Among many it depends primarily on 3 factors -
1. Data Size
2. What kind of performance you are looking at
3. Amount of space (memory) you can trade off for #2

In general std::sort() performs at an avg at nlogn -

Complexity On average, linearithmic in the distance between first and last: Performs approximately N*log2(N) (where N is this distance) comparisons of elements, and up to that many element swaps (or moves).

Now if your use case involves the sort methods to be invoked too often, it might make sense to pre-sort and save the vectors - In which case you will have considerable performance gain. Now in this design you have to factor in cases like is the collection modifiable? If yes then how often? then you have to consider avg case insertion performance hit.

So in summary it depends

Upvotes: 1

Azeem
Azeem

Reputation: 14587

IMO, the approach you are using right now is good to go i.e. sort whenever you need it on runtime. For larger datasets, you need to first evaluate your requirements in terms of memory and processing power. For example, for a very large dataset, you won't be able to sort it in memory. And, there'll be synchronization issues in case you decide towards a multithreading solution. So, you'd be needing some specialized solution like a DBMS where you can query sorted data as you required on runtime. You'll have features like indexes to optimize query time.

Upvotes: 1

Related Questions