Todd Young
Todd Young

Reputation: 49

Translating R to Rcpp: mean time difference between successive dates in vector

I have an R function that takes a vector of sorted dates (descending order) that returns the mean time difference between successive dates in the vector. I am trying to translate this R function into an Rcpp function.

Here is what I have so far:

sorted_dates <- as.Date(c("2015-09-25", "2015-06-12",
                        "2015-06-12", "2015-03-26"))

mean_time_difference <- function(sorted_dates){
### Takes a vector of dates sorted in descending order
### Returns the mean time difference between dates.

time_differences <- integer()
for(i in 1:(length(sorted_dates) - 1)){
time_differences[i] <- as.integer( sorted_dates[i] - sorted_dates[i+1])
}

return(mean(time_differences))

}

This is my currently broken translation into Rcpp:

cppFunction('double mean_time_diff(DateVector sorted_dates) {
/* Takes a vector of dates sorted in descending order
*/ Returns the mean time difference between dates. 

int n = sorted_dates.size();
IntegerVector time_diff;

for(int i=1; i < (n-1); i++){
  time_diff.push_back( sorted_dates[i] - sorted_dates[i+1] );
}

int m = time_diff.size();
double total = 0; 
for(int i=1; i < m; i++) {
  total += time_diff[i];
}

return total / m;

}')

mean_time_difference(sorted_dates)
mean_time_diff(sorted_dates)

I am sure there is plenty that could be improved in both the R and the Rcpp functions. Can anyone show me how to best implement that function in Rcpp?

Upvotes: 1

Views: 433

Answers (4)

rosscova
rosscova

Reputation: 5590

EDIT: I just noticed that @Frank has already suggested this in the comments. I still think the code is useful, so I'll leave this here.

Just to add to this, your problem is actually simpler than it looks, and since you're (I assume) looking for a solution with fast processing, it may serve you better to simplify the problem, rather than try to recode it to a different language. Particularly since you won't need to do any sorting, just leave your date set as it is, and apply the below.

You're asking for the mean difference between a set of numbers after sorting, which is the same as:

as.numeric( ( max( dates ) - min( dates ) ) / ( length( dates ) - 1L ) )

This is the total range of the series, divided by the number of "gaps" (equal to the number of values minus 1).

I haven't tested, but I believe the above will be significantly faster that the other methods here, especially for large datasets.

Upvotes: 2

coatless
coatless

Reputation: 20746

Few notes:

  1. C++ indices start at 0 not 1!
  2. Using [] does not provide a bounds check (leading to undefined behavior) where () does provide a bounds check for accessing arrays.
  3. Avoid using .push_back as it will copy the data due to the proxy model that will cause a lot of heartache (more so a slowdown).
  4. Also, you probably should split this function into a time differencing routine (see a differencing generic for armadillo) and then a mean function.

Now, time for some code:

// [[Rcpp::export]]
Rcpp::IntegerVector diff_date(Rcpp::DateVector sorted_dates){
  // Length of Time Series
  unsigned int n = sorted_dates.size();

  // Initialize result   
  Rcpp::IntegerVector time_diff(n-1);

  // Difference operator X_t - X_{t+1}
  for(unsigned int i = 0; i < (n-1); i++){
    time_diff[i] = sorted_dates[i] - sorted_dates[i+1];
  }

  // Return differenced series:
  return time_diff;
}

// [[Rcpp::export]]
double mean_diff_date(Rcpp::DateVector sorted_dates){

  // Difference time series by above routine
  Rcpp::IntegerVector time_diff = diff_date(sorted_dates);

  // Length of Time Series
  unsigned int n = time_diff.size();

  // Mean routine (could be replaced with mean() )
  double total = 0;

  for(unsigned int i = 0; i < n; i++){
     total += time_diff[i];
  }

  return total/n;
}

Upvotes: 1

Todd Young
Todd Young

Reputation: 49

For completeness, following the hint from @DirkEddelbuettel, here is the solution using Rcpp:

cppFunction('double mean_time_diff(DateVector sorted_dates) {

  int n = sorted_dates.size();
  IntegerVector time_diff;

  for(int i=0; i < (n-1); i++){
    time_diff.push_back( sorted_dates[i] - sorted_dates[i+1] );
  }

  int m = time_diff.size();
  double total = 0; 

  for(int i=0; i < m; i++) {
    total += time_diff[i];
  }

  return total / m;

}')

The key is to remember C++ is zero indexed.

Upvotes: -2

Dirk is no longer here
Dirk is no longer here

Reputation: 368261

Is this what you are looking for, in a plain R approach:

> sorted_dates <- as.Date(c("2015-09-25", "2015-06-12", 
+                         "2015-06-12", "2015-03-26"))
> mean(diff(sorted_dates))
Time difference of -61 days
> mean(as.numeric(diff(sorted_dates)))
[1] -61
> 

You can do these things with Rcpp, but you probably do them in base R, or with any of the add-on aggregation utilities -- I like data.table.

Upvotes: 3

Related Questions