Reputation: 426
My question is similar to this one: I need to create all combinations between a data.frame
and a vector
, but I need a solution for multicolumn data.frames, so I can reduce computation time for larger problems.
Example of what I'm looking for:
I need to create a combination of 1:3
with itself three times, but, in the end, I just need the combinations that the total sum
is less than 5.
One way to do this is to simply use expand.grid
and end up with 27 combinations and then just 4 combinations that obey my sum rule.
> x = 1:3
> b = expand.grid(x,x,x)
> rows = apply(b,1,sum)
> sum(rows < 5)
[1] 4
# Which rows obey the rule
> b[rows<5,]
Var1 Var2 Var3
1 1 1 1
2 2 1 1
4 1 2 1
10 1 1 2
That works just fine, but for larger vectors or multiple combinations, instead of just 3, it takes a lot of processing. I figured that another way to do this would be dividing the task and applying a filter in each step:
> x = 1:3
> a = expand.grid(x,x)
> rows = apply(a,1,sum)
> sum(rows < 5)
[1] 6
# Which rows obey the rule
> a[rows<5,]
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
7 1 3
And then take these 6 rows from a
and combine them with x
, and once again subset it according to my rule, but I don't know how to combine a
and x
Upvotes: 2
Views: 269
Reputation: 26446
You can expand.grid
on the row numbers and cbind
together
expand.grid.XY <- function(X,Y) {
X<-as.data.frame(X);
Y<-as.data.frame(Y);
idx<-expand.grid(1:nrow(X),1:nrow(Y));
cbind(X[idx[,1],,drop=FALSE],Y[idx[,2],,drop=FALSE])
}
With your example,
expand.grid.XY(a[rows<5,],x)
Var1 Var2 Y 1 1 1 1 2 2 1 1 3 3 1 1 4 1 2 1 5 2 2 1 7 1 3 1 1.1 1 1 2 2.1 2 1 2 3.1 3 1 2 4.1 1 2 2 5.1 2 2 2 7.1 1 3 2 1.2 1 1 3 2.2 2 1 3 3.2 3 1 3 4.2 1 2 3 5.2 2 2 3 7.2 1 3 3
Depending on the nature of your problem, however, you might want to look into the foreach
package, which includes a when
filter and parallel processing possibilities.
Upvotes: 1