Reputation: 628
I have a data frame in R that consists of two (numeric) columns, say A and B. I would like to constructor the column consisting of the elements
A1+B1
(A1+B1)*A2+B2
((A1+B1)*A2+B2)*A3+B3
...
This probably is an easy one-liner, but I don't see it.
Edit: deleted the word vectorized from the title, since I'm basically just interested in any elegant solution (the dumb ones I can do myself). In F# - which I'm more familiar with - this would be something like (assuming the elements would be in a list as tuples, which would be more idiomatic):
ABlist |> List.fold (fun acc (a,b) -> acc*a+b) 1
Which is still something very short and clear. I'm dragging this up, because I'm an R noob and unfamiliar with it, but I have read somewhere that it's a functional language, so I would guess a solution in terms of a fold over a data-frame would exist?
Upvotes: 2
Views: 645
Reputation: 628
Ok, turns out I was too lazy, I figured it out myself (note: before that there was a good answer using Rcpp already, but I can't use that at work). Basically just a translation to R of what I wrote in my edit on how I would do this in F#:
a <- c(1,2,3)
b <- c(4,5,6)
abList <- list(a,b)
Reduce( function(acc,x) {acc*x[[1]]+x[[2]]},
abList,
accumulate=TRUE)
Does the trick. Edit: as per the comment below, it actually doesn't do the trick. If one build abList by
abList <- apply(rbind(a,b),2,as.pairlist)
and then folds by:
Reduce(function(acc,x) {(acc*x[[1]])+x[[2]]},abList,1,accumulate=TRUE)
One gets the right answer (with a 1 prepended because that's the intial value for the accumulator)
Upvotes: 2
Reputation: 263342
This is a different answer driven by the failure of Bram's effort at using Reduce. It builds a list that has the A;B pairs and then sets the intial value for the accumulator to 1 so that the first multiplication doesn't get zeroed out:
abList <- mapply(c, A,B, SIMPLIFY=FALSE) # keeps as a list
Reduce( function(acc,x) {acc*x[1]+x[2]},
abList,init=c(1,0),
accumulate=TRUE)[-1] # removes the initialization value
#--------
[[1]]
[1] 4 3
[[2]]
[1] 10 8
[[3]]
[1] 31 25
[[4]]
[1] 128 104
[[5]]
[1] 645 525
Might take some further work with s/lapply( ..., "[", 1)
to pull out the accumulator
> sapply( Reduce( function(acc,x) {acc*x[1]+x[2]},
+ abList,init=c(1,0),
+ accumulate=TRUE)[-1], "[", 1)
[1] 4 10 31 128 645
Upvotes: 3
Reputation: 263342
X = 1; for ( i in seq(length(A) ) ) { X= B[i]+A[i]*X; print(X) }
[1] 4
[1] 10
[1] 31
[1] 128
[1] 645
If you want to accumulate rather than reprot:
X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]
#[1] 4 10 31 128 645
Will be ploddingly slow compared to the Rcpp solution, but if you need to do the compilation step on the fly it's only when the lengths are more than 1000 that you might even notice the difference:
> A <- sample(1:10000, 1000); B <- sample(1:10000, 1000)
> system.time( {X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]})
user system elapsed
0.014 0.002 0.017
> library(Rcpp)
> system.time( {sum.prod <- cppFunction(
+ "NumericVector sum_prod(NumericVector x, NumericVector y) {
+ NumericVector result(x.size());
+ result[0] = x[0] + y[0];
+ for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
+ return result;
+ }")
+ sum.prod(A,B) } )
user system elapsed
0.012 0.002 0.014
Upvotes: 1
Reputation: 44320
This is relatively straightforward in Rcpp, which won't have the performance problems you would see if you tried to implement this with loops in R.
library(Rcpp)
sum.prod <- cppFunction(
"NumericVector sum_prod(NumericVector x, NumericVector y) {
NumericVector result(x.size());
result[0] = x[0] + y[0];
for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
return result;
}")
sum.prod(c(1, 2, 3, 4, 5), c(3, 2, 1, 4, 5))
# [1] 4 10 31 128 645
I've found Rcpp to be the simplest way to speed up hard-to-vectorize computations.
Upvotes: 1