Bernardo
Bernardo

Reputation: 426

Combination of data.frame and vector

My question is similar to this one: I need to create all combinations between a data.frame and a vector, but I need a solution for multicolumn data.frames, so I can reduce computation time for larger problems.

Example of what I'm looking for:

I need to create a combination of 1:3 with itself three times, but, in the end, I just need the combinations that the total sum is less than 5.

One way to do this is to simply use expand.grid and end up with 27 combinations and then just 4 combinations that obey my sum rule.

> x = 1:3
> b = expand.grid(x,x,x)

> rows = apply(b,1,sum)
> sum(rows < 5)
[1] 4

# Which rows obey the rule
> b[rows<5,]
   Var1 Var2 Var3
1     1    1    1
2     2    1    1
4     1    2    1
10    1    1    2

That works just fine, but for larger vectors or multiple combinations, instead of just 3, it takes a lot of processing. I figured that another way to do this would be dividing the task and applying a filter in each step:

> x = 1:3
> a = expand.grid(x,x)
> rows = apply(a,1,sum)
> sum(rows < 5)
[1] 6

# Which rows obey the rule
> a[rows<5,]
  Var1 Var2
1    1    1
2    2    1
3    3    1
4    1    2
5    2    2
7    1    3

And then take these 6 rows from a and combine them with x, and once again subset it according to my rule, but I don't know how to combine a and x

Upvotes: 2

Views: 269

Answers (1)

A. Webb
A. Webb

Reputation: 26446

You can expand.grid on the row numbers and cbind together

expand.grid.XY <- function(X,Y) {
  X<-as.data.frame(X);
  Y<-as.data.frame(Y);
  idx<-expand.grid(1:nrow(X),1:nrow(Y));
  cbind(X[idx[,1],,drop=FALSE],Y[idx[,2],,drop=FALSE])
}

With your example,

expand.grid.XY(a[rows<5,],x)
    Var1 Var2 Y
1      1    1 1
2      2    1 1
3      3    1 1
4      1    2 1
5      2    2 1
7      1    3 1
1.1    1    1 2
2.1    2    1 2
3.1    3    1 2
4.1    1    2 2
5.1    2    2 2
7.1    1    3 2
1.2    1    1 3
2.2    2    1 3
3.2    3    1 3
4.2    1    2 3
5.2    2    2 3
7.2    1    3 3

Depending on the nature of your problem, however, you might want to look into the foreach package, which includes a when filter and parallel processing possibilities.

Upvotes: 1

Related Questions