Bram
Bram

Reputation: 628

recursive sum plus product in r

I have a data frame in R that consists of two (numeric) columns, say A and B. I would like to constructor the column consisting of the elements

  A1+B1
 (A1+B1)*A2+B2
((A1+B1)*A2+B2)*A3+B3
  ...

This probably is an easy one-liner, but I don't see it.

Edit: deleted the word vectorized from the title, since I'm basically just interested in any elegant solution (the dumb ones I can do myself). In F# - which I'm more familiar with - this would be something like (assuming the elements would be in a list as tuples, which would be more idiomatic):

ABlist |> List.fold (fun acc (a,b) -> acc*a+b) 1

Which is still something very short and clear. I'm dragging this up, because I'm an R noob and unfamiliar with it, but I have read somewhere that it's a functional language, so I would guess a solution in terms of a fold over a data-frame would exist?

Upvotes: 2

Views: 645

Answers (4)

Bram
Bram

Reputation: 628

Ok, turns out I was too lazy, I figured it out myself (note: before that there was a good answer using Rcpp already, but I can't use that at work). Basically just a translation to R of what I wrote in my edit on how I would do this in F#:

a <- c(1,2,3)
b <- c(4,5,6)
abList <- list(a,b)
Reduce( function(acc,x) {acc*x[[1]]+x[[2]]},
        abList,
        accumulate=TRUE)

Does the trick. Edit: as per the comment below, it actually doesn't do the trick. If one build abList by

abList <- apply(rbind(a,b),2,as.pairlist)

and then folds by:

Reduce(function(acc,x) {(acc*x[[1]])+x[[2]]},abList,1,accumulate=TRUE)

One gets the right answer (with a 1 prepended because that's the intial value for the accumulator)

Upvotes: 2

IRTFM
IRTFM

Reputation: 263342

This is a different answer driven by the failure of Bram's effort at using Reduce. It builds a list that has the A;B pairs and then sets the intial value for the accumulator to 1 so that the first multiplication doesn't get zeroed out:

abList <- mapply(c, A,B, SIMPLIFY=FALSE)  # keeps as a list
Reduce( function(acc,x) {acc*x[1]+x[2]},
         abList,init=c(1,0),
         accumulate=TRUE)[-1]             # removes the initialization value
#--------
[[1]]
[1] 4 3

[[2]]
[1] 10  8

[[3]]
[1] 31 25

[[4]]
[1] 128 104

[[5]]
[1] 645 525

Might take some further work with s/lapply( ..., "[", 1) to pull out the accumulator

> sapply( Reduce( function(acc,x) {acc*x[1]+x[2]},
+         abList,init=c(1,0),
+         accumulate=TRUE)[-1], "[", 1)
[1]   4  10  31 128 645

Upvotes: 3

IRTFM
IRTFM

Reputation: 263342

X = 1; for ( i in seq(length(A) ) ) { X= B[i]+A[i]*X; print(X) }
[1] 4
[1] 10
[1] 31
[1] 128
[1] 645

If you want to accumulate rather than reprot:

 X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]
#[1]   4  10  31 128 645

Will be ploddingly slow compared to the Rcpp solution, but if you need to do the compilation step on the fly it's only when the lengths are more than 1000 that you might even notice the difference:

> A <- sample(1:10000, 1000);  B <- sample(1:10000, 1000)
> system.time( {X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]})
   user  system elapsed 
  0.014   0.002   0.017 
> library(Rcpp)
> system.time( {sum.prod <- cppFunction(
+ "NumericVector sum_prod(NumericVector x, NumericVector y) {
+   NumericVector result(x.size());
+   result[0] = x[0] + y[0];
+   for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
+   return result;
+ }")
+ sum.prod(A,B) } )
   user  system elapsed 
  0.012   0.002   0.014 

Upvotes: 1

josliber
josliber

Reputation: 44320

This is relatively straightforward in Rcpp, which won't have the performance problems you would see if you tried to implement this with loops in R.

library(Rcpp)
sum.prod <- cppFunction(
"NumericVector sum_prod(NumericVector x, NumericVector y) {
  NumericVector result(x.size());
  result[0] = x[0] + y[0];
  for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
  return result;
}")
sum.prod(c(1, 2, 3, 4, 5), c(3, 2, 1, 4, 5))
# [1]   4  10  31 128 645

I've found Rcpp to be the simplest way to speed up hard-to-vectorize computations.

Upvotes: 1

Related Questions