Reputation: 967
I have a class
class Zaposlenik {
private:
string prezime;
string funkcija;
double placa;
public:
bool operator==(const string& prezime) const;
bool operator<(const string &prezime) const;
bool operator<(const Zaposlenik &other) const;
I use operators with string for binary search and operator< with Zaposlenik for sorting
I can't change the header class I can only write code in .cpp.
I also have
class Firma {
private:
vector<Zaposlenik> zaposlenici;
public:
void sort();
I can't change that class either,I have to write .cpp for it. I upload the 2 .cpp to a auto grading server that inputs 500 000 Zaposlenik into vector zaposlenici and then preforms 2 000 000 searches.
I used qsort and bsearch and it was too slow. It can't be longer then 3s when I upload it.
I have written overloaded operators and I belive they are fine,but apparently qsort can be faster.
Vector is sorted by string prezime and names are from "aaaa" to "ZZZZ" so 4 letter combination of big and small letters.
string funkcija;
and double placa;
don't mean anything for sorting.
Can someone tell me which sort would be faster then qsort? Keep in mind that I don't have any control over main and I can't count members when they are made.
P.S. there are other functions in classes but they don't have any meaning for this part. There is also function for Bsearch but that is as fast as it gets I belive.
Upvotes: 2
Views: 1007
Reputation: 263128
auto grading server that inputs 500 000 Zaposlenik into vector zaposlenici and then preforms 2 000 000 searches. I used qsort and bsearch and it was too slow.
Just to clarify, you did not call qsort before every call to bsearch, right? Because binary search is only fast if the list is already sorted. If you sort the list before every search, you get abysmal performance.
Given the constraints you outlined (all strings are four characters long), I just tested std::sort
vs. a custom bucket sort, and on a million elements, the bucket sort was 8 times faster. Hint: strings of four characters can be encoded in 4 * 6 = 24 bits, so you will need 16.777.216 buckets for counting.
Upvotes: 3
Reputation: 34628
The issue here really is your restriction of the class header. I suspect the bottleneck is either the swapping operation while sorting or the lexical string comparison (or possibly both). If you cannot change the class definition at all, it's going to be tricky to improve that, since you would have to add a lot of helper code in your implementation and everything gets more complicated than it has to be.
Anyhow, here is the approach I would suggest: Since you are sorting based on strings, implement yourself a specialised version of a Trie, you cannot beat the performance of a Trie when sorting sequences lexicographically. You can implement this data structure entirely in your .cpp file and instantiate it in your Firma::sort
method.
As you seem to be focussing on speed, you are probably willing to make a trade-off with regard to memory consumption. So, you implement each Node in your Treap as either an std::vector<std::shared_ptr<Trie>>
which you initialise to a length of 256 (with all slots initialised to nullptr
) or an std::array<std::shared_ptr<Trie>,256>
.
You now basically insert each of your strings into the data structure and then read them all out again. This approach is linear in the total size of all strings combined (and thus optimal).
Side note:
Note that the 256 slot table at each node incurs a constant factor of 256 when traversing the Trie (i.e. when writing the Firma::zaposlenici
member). If you are dealing with ASCII you can reduce the table size to 128 or split individual bytes into half-bytes, thereby incurring an overhead of 2*16 instead of 256.
Edit: If you know that you will only encounter characters from a..z and A..Z then you have a base alphabet size of 2*26 = 52 instead of 256. So your lookup table in each node of the Trie only has to be of size 52 (that is, each node can have at most 52 child nodes).
Upvotes: 1
Reputation: 153919
There are several things to take into account. First and
foremost, you cannot use qsort
on std::vector<Zaposlenik>
.
qsort
uses memcpy
to copy objects when it swaps, and
memcpy
only works on objects with trivial copy. Using qsort
in this case may be faster because it doesn't correcly copy the
objects. You must use std::sort
; nothing else will work.
Having done that: the speed (or at least the party that you can
influence) of std::sort
depends on two things: the speed of
your comparison, and the speed of a swap. You don't show what
bool Zaposlenik::operator<( Zaposlenik const& other ) const;
does,
so we can only guess. If it does anything more than return
prezime, other.prezime )
, then you should write a separate
comparison function, and invoke std::sort
with it. The other
aspect is the swap: std::sort
ultimately uses std::swap
,
whose default implementation will be something like:
template <typename T>
void
std::sort( T& lhs, T& rhs )
{
T tmp( lhs );
lhs = rhs;
rhs = lhs;
}
For many classes, this involves a lot of extra copying; for
std::string
, for example, this would do three deep copies of the
string, which may involve dynamic allocation and freeing of the
memory. std::string
, however, has a member function swap
;
in many cases, it can implement the swap just by swapping
pointers, rather than by doing a deep copy. This can result in
a significant speed up. Your class Zaposlenik
does not do
anything to optimize std::swap
, however, so you get the deep
copies. You should provide a member function swap
:
void Zaposlenik::swap( Zaposlenik& other )
{
swap( prezime, other.prezime );
swap( funkcija, other.funkcija );
swap( placa, other.placa );
}
This function will use the system optimized swaps of
std::string
. To make sure that std::sort
uses it, you
should also provide an overloaded free function swap
(in the
same namespace as Zaposlenik
), which calls your member
function:
void
swap( Zaposlenik& lhs, Zaposlenik& rhs )
{
lhs.swap( rhs );
}
The reason for this: std::sort
calls a free function swap
.
Upvotes: 2
Reputation: 545598
Three things:
Use std::sort
instead of std::qsort
, it’s faster because it can inline calls to the comparison operator (if you define it in the header or enable link-time optimisations).
Override swap
for your class so that it can be swapped efficiently instead of copying via a temporary variable. However, that requires changing the header (because you need access to the private variables).
Since your sorted strings are of fixed length 4, a different sorting algorithm will be beneficial. The classical choice, which is fairly easy to implement, is radix sort. Judging from some of your comments it seems likely that your professor wants you to implement this.
Upvotes: 9