Reputation: 49
I have an R function that takes a vector of sorted dates (descending order) that returns the mean time difference between successive dates in the vector. I am trying to translate this R function into an Rcpp function.
Here is what I have so far:
sorted_dates <- as.Date(c("2015-09-25", "2015-06-12",
"2015-06-12", "2015-03-26"))
mean_time_difference <- function(sorted_dates){
### Takes a vector of dates sorted in descending order
### Returns the mean time difference between dates.
time_differences <- integer()
for(i in 1:(length(sorted_dates) - 1)){
time_differences[i] <- as.integer( sorted_dates[i] - sorted_dates[i+1])
}
return(mean(time_differences))
}
This is my currently broken translation into Rcpp:
cppFunction('double mean_time_diff(DateVector sorted_dates) {
/* Takes a vector of dates sorted in descending order
*/ Returns the mean time difference between dates.
int n = sorted_dates.size();
IntegerVector time_diff;
for(int i=1; i < (n-1); i++){
time_diff.push_back( sorted_dates[i] - sorted_dates[i+1] );
}
int m = time_diff.size();
double total = 0;
for(int i=1; i < m; i++) {
total += time_diff[i];
}
return total / m;
}')
mean_time_difference(sorted_dates)
mean_time_diff(sorted_dates)
I am sure there is plenty that could be improved in both the R and the Rcpp functions. Can anyone show me how to best implement that function in Rcpp?
Upvotes: 1
Views: 433
Reputation: 5590
EDIT: I just noticed that @Frank has already suggested this in the comments. I still think the code is useful, so I'll leave this here.
Just to add to this, your problem is actually simpler than it looks, and since you're (I assume) looking for a solution with fast processing, it may serve you better to simplify the problem, rather than try to recode it to a different language. Particularly since you won't need to do any sorting, just leave your date set as it is, and apply the below.
You're asking for the mean difference between a set of numbers after sorting, which is the same as:
as.numeric( ( max( dates ) - min( dates ) ) / ( length( dates ) - 1L ) )
This is the total range of the series, divided by the number of "gaps" (equal to the number of values minus 1).
I haven't tested, but I believe the above will be significantly faster that the other methods here, especially for large datasets.
Upvotes: 2
Reputation: 20746
Few notes:
[]
does not provide a bounds check (leading to undefined behavior) where ()
does provide a bounds check for accessing arrays..push_back
as it will copy the data due to the proxy model that will cause a lot of heartache (more so a slowdown). Now, time for some code:
// [[Rcpp::export]]
Rcpp::IntegerVector diff_date(Rcpp::DateVector sorted_dates){
// Length of Time Series
unsigned int n = sorted_dates.size();
// Initialize result
Rcpp::IntegerVector time_diff(n-1);
// Difference operator X_t - X_{t+1}
for(unsigned int i = 0; i < (n-1); i++){
time_diff[i] = sorted_dates[i] - sorted_dates[i+1];
}
// Return differenced series:
return time_diff;
}
// [[Rcpp::export]]
double mean_diff_date(Rcpp::DateVector sorted_dates){
// Difference time series by above routine
Rcpp::IntegerVector time_diff = diff_date(sorted_dates);
// Length of Time Series
unsigned int n = time_diff.size();
// Mean routine (could be replaced with mean() )
double total = 0;
for(unsigned int i = 0; i < n; i++){
total += time_diff[i];
}
return total/n;
}
Upvotes: 1
Reputation: 49
For completeness, following the hint from @DirkEddelbuettel, here is the solution using Rcpp:
cppFunction('double mean_time_diff(DateVector sorted_dates) {
int n = sorted_dates.size();
IntegerVector time_diff;
for(int i=0; i < (n-1); i++){
time_diff.push_back( sorted_dates[i] - sorted_dates[i+1] );
}
int m = time_diff.size();
double total = 0;
for(int i=0; i < m; i++) {
total += time_diff[i];
}
return total / m;
}')
The key is to remember C++ is zero indexed.
Upvotes: -2
Reputation: 368261
Is this what you are looking for, in a plain R approach:
> sorted_dates <- as.Date(c("2015-09-25", "2015-06-12",
+ "2015-06-12", "2015-03-26"))
> mean(diff(sorted_dates))
Time difference of -61 days
> mean(as.numeric(diff(sorted_dates)))
[1] -61
>
You can do these things with Rcpp, but you probably do them in base R, or with any of the add-on aggregation utilities -- I like data.table.
Upvotes: 3