user3507085
user3507085

Reputation: 720

Rcpp unique order output

I noticed that unique function from Rcpp orders the results

evalCpp("unique(IntegerVector::create(6,6,1,5,5,1))")
[1] 6 5 1
unique(c(6,6,1,5,5,1))
[1] 6 1 5

Is there a way to avoid this ? Thanks for your help

Upvotes: 1

Views: 485

Answers (4)

zxr6
zxr6

Reputation: 11

You may use duplicated(_) == 0 to dedup your vector while keeping the order of first appearance:

IntegerVector x = {6,6,1,5,5,1};
IntegerVector out = x[duplicated(x) == 0];
Rcout << out << std::endl;

Upvotes: 0

SymbolixAU
SymbolixAU

Reputation: 26258

This is how I've implemented it, along with the issue I was trying to solve when I came up with it (using this answer, and it also shows various other solutions and benchmarks).

  template < typename T, int RTYPE >
  inline SEXP sexp_unique( Rcpp::Vector< RTYPE > x ) {
    std::set< T > seen;
    auto newEnd = std::remove_if( x.begin(), x.end(), [&seen]( const T value ) {
      if ( seen.find( value ) != std::end( seen ) ) {
        return true;
      }
      seen.insert( value );
      return false;
    });
    x.erase( newEnd, x.end() );
    return x;
  }


  // returns unique values in their original input order
  inline SEXP get_sexp_unique( SEXP s ) {

    SEXP s2 = Rcpp::clone( s );

    switch( TYPEOF( s2 ) ) {
    case LGLSXP: {
      return sexp_unique< bool, LGLSXP >( s2 );
    }
    case REALSXP: {
      return sexp_unique< double, REALSXP >( s2 );
    }
    case INTSXP: {
      return sexp_unique< int, INTSXP >( s2 );
    }
    case STRSXP: {
      return sexp_unique< char* , STRSXP >( s2 );
    }
    default: Rcpp::stop("unknown vector type");
    }
    return 0;
  }

Upvotes: 1

Dirk is no longer here
Dirk is no longer here

Reputation: 368509

If you look at the (short) source file you see that it uses an internal class IndexHash. I suspect this one sorts by default.

If original order is paramount, I supposed you could write yourself a new convenience wrapper. It cannot be all this hard: at the risk of wasting a few bytes of memory, allocate a temporary logical vector, use a standard hashmap and loop over the incoming vector. For each value, ask if the hashmap has seen this value, store the boolean answer. Then use that to index the original vector.

Chances are this is even implemented somewhere. Also look at Armadillo and Eigen for utility functions.

Upvotes: 2

GoGonzo
GoGonzo

Reputation: 2877

This one might help someone - works only for sorted vectors.

  template <int ITYPE>
  Rcpp::Vector<ITYPE> unique(Rcpp::Vector<ITYPE> x) {
    int n = x.size();
    if (n == 1) return(x);

    Rcpp::Vector<ITYPE> res;
    res.push_back(x(0));


    for (int i = 1; i < n; i++) {
      if (x[i] != x(i - 1)) {
        res.push_back(x(i));
      } 
    }

    return res;
  }


Upvotes: 0

Related Questions