Reputation: 333
I have implemented quick sort algorithm in Rcpp, but it works significantly slower than sort(array, method="quick")
for large arrays. Why?
Here is my Rcpp code
// partition using hoare's scheme
#include <Rcpp.h>
using namespace Rcpp;
int partition(NumericVector a,int start,int end)
{
double pivot = a[end];
int i = start - 1;
int j = end + 1;
//Rcout << a <<"\n";
while(1)
{
do {
i++;
} while (a[i] < pivot);
do {
j--;
} while (pivot < a[j]);
if(i > j)
return j;
//special.Swap(a, i, j);
std::swap(a[i], a[j]);
}
}
void qsort(NumericVector a,int start,int end)
{
//Rcout << start <<"," << end <<"\n";
if(start < end)
{
int P_index = partition(a, start, end);
//Rcout << P_index << "\n";
qsort(a, start, P_index);
qsort(a, P_index + 1, end);
}
}
// [[Rcpp::export]]
NumericVector QuickSortH_WC(NumericVector arr)
{
int len = arr.size();
qsort(arr, 0, len-1);
//Rcout << arr <<"\n";
return 1;
}
Also for arrays with floating values, the algorithm is worse. I want to make a comparison with hoare's and lomuto partitioning scheme, But I do not know whether this implementation has any flaw in it for which algorithm works slower.
Upvotes: 2
Views: 182
Reputation: 26823
The main reason for the inefficiency of your code seems to be mixing of the two partitioning schemes you want to compare. You claim to use the Hoare partition scheme, and the code looks very much like it, but pivot
is calculated according to the Lomuto partition scheme. In addition, you should return j
if i >= j
, not if i > j
. Fixing these two things and replacing i++
with the slightly faster ++i
I get:
// partition using hoare's scheme
#include <Rcpp.h>
using namespace Rcpp;
int partition(NumericVector a,int start,int end)
{
double pivot = a[(start + end) / 2];
int i = start - 1;
int j = end + 1;
//Rcout << a <<"\n";
while(1)
{
do {
++i;
} while (a[i] < pivot);
do {
--j;
} while (pivot < a[j]);
if(i >= j)
return j;
//special.Swap(a, i, j);
std::swap(a[i], a[j]);
}
}
void qsort(NumericVector a,int start,int end)
{
//Rcout << start <<"," << end <<"\n";
if(start < end)
{
int P_index = partition(a, start, end);
//Rcout << P_index << "\n";
qsort(a, start, P_index);
qsort(a, P_index + 1, end);
}
}
// [[Rcpp::export]]
NumericVector QuickSortH_WC(NumericVector arr)
{
int len = arr.size();
qsort(arr, 0, len-1);
//Rcout << arr <<"\n";
return arr;
}
/*** R
set.seed(42)
dat <- runif(1e6)
bench::mark(QuickSortH_WC(dat), sort(dat, method="quick"))
*/
Output
> bench::mark(QuickSortH_WC(dat), sort(dat, method="quick"))
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int>
1 QuickSortH_WC(dat) 95.7ms 100.5ms 8.63 2.49KB 43.2 5
2 sort(dat, method = "quick") 15ms 16.5ms 53.1 11.44MB 23.6 27
# … with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>,
# memory <list>, time <list>, gc <list>
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled.
So while this method is about a factor of 7 slow than R's sort
, it has at least comparable order of magnitude for the run time. (Thanks @JosephWood for digging out the link). And Wikipedia lists even more improvements over these two schemas.
BTW, I also changed the wrapper function to return the changed array. This allows me to use the default behavior of bench::mark
which is to compare the returned results. I find that useful ...
Upvotes: 3
Reputation: 131
Rcpp apply recursive functions badly. I suggest iterative quick sort implementation:
void _Quick_sorti( double _array[],int _l,int _h){
int *_stack=new int [_h-_l+1]; double _tmp;int _i,_p,_top=-1;
_stack[++_top]=_l;_stack[++_top]=_h;
while(_top>=0){
_h=_stack[_top--];_l=_stack[_top--];
_tmp=_array[_h];
_i=_l-1;
for(int _j=_l;_j<=_h-1;_j++){
if(_array[_j]<=_tmp){_i++;std::swap(_array[_i],_array[_j]);}
}
_p=_i+1;
std::swap(_array[_p],_array[_h]);
if(_p-1>_l){_stack[++_top]=_l;_stack[++_top]=_p-1;}
if(_p+1<_h){_stack[++_top]=_p+1;_stack[++_top]=_h;}
}
delete _stack;
}
// [[Rcpp::export]]
SEXP Quick_sorti(SEXP &unsorted) { //run
SEXP temp=clone(unsorted);// or Rf_duplicate
double *z=REAL(temp);
int N=LENGTH(temp)-1;
int k=0;
_Quick_sorti(z,k,N); // note that we have provide lvalue (if we put 0 it will not works int place of N)
return temp;}
The code is adapted from a macros that include '_' prefix and look ugly moreover it use R
internals. Adding stack imply N more memory requirement.
Upvotes: 1