Reputation: 1927
I am trying to create a unique combination of all elements from two vectors of different size in R.
For example, the first vector is
a <- c("ABC", "DEF", "GHI")
and the second one is dates stored as strings currently
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
I need to create a data frame with two columns like this
> data
a b
1 ABC 2012-05-01
2 ABC 2012-05-02
3 ABC 2012-05-03
4 ABC 2012-05-04
5 ABC 2012-05-05
6 DEF 2012-05-01
7 DEF 2012-05-02
8 DEF 2012-05-03
9 DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05
So basically, I am looking for a unique combination by considering all the elements of one vector (a) juxtaposed with all the elements of the second vector (b).
An ideal solution would generalize to more input vectors.
See also:
How to generate a matrix of combinations
Upvotes: 149
Views: 153451
Reputation: 18722
You could use rep
and the fact that base R data frames recycle:
data.frame(a = rep(a, each = length(b)), b = b)
Upvotes: 2
Reputation: 1528
In base R you could try merge(), cbind() and expand.grid().
a <- seq(1E4)
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
microbenchmark(
"merge (1)" = mmm <- as.matrix(merge(a, b)),
"diy (2)" = {ccc <- cbind( rep(a, length(b)),
b[rep(seq_along(b), each = length(a))]
)
},
"diy R (3)" = {ccc <- cbind( a,
b[rep(seq_along(b), each = length(a))]
)
},
"grid (4)" = ggg <- expand.grid(a, b),
times = 2
)
Output.
Unit: milliseconds
expr min lq mean median uq max neval
merge (1) 863.3100 863.3100 888.6573 888.6573 914.0046 914.0046 2
diy (2) 117.1912 117.1912 142.1394 142.1394 167.0875 167.0875 2
diy R (3) 34.9320 34.9320 49.4119 49.4119 63.8918 63.8918 2
grid (4) 45.1876 45.1876 46.1592 46.1592 47.1308 47.1308 2
Upvotes: 1
Reputation: 83275
Missing in this r-faq overview is the CJ
-function from the data.table-package. Using:
library(data.table)
CJ(a, b, unique = TRUE)
gives:
a b 1: ABC 2012-05-01 2: ABC 2012-05-02 3: ABC 2012-05-03 4: ABC 2012-05-04 5: ABC 2012-05-05 6: DEF 2012-05-01 7: DEF 2012-05-02 8: DEF 2012-05-03 9: DEF 2012-05-04 10: DEF 2012-05-05 11: GHI 2012-05-01 12: GHI 2012-05-02 13: GHI 2012-05-03 14: GHI 2012-05-04 15: GHI 2012-05-05
NOTE: since version 1.12.2 CJ
autonames the resulting columns (see also here and here).
Upvotes: 28
Reputation: 40171
Since version 1.0.0, tidyr
offers its own version of expand.grid()
. It completes the existing family of expand()
, nesting()
, and crossing()
with a low-level function that works with vectors.
When compared to base::expand.grid()
:
Varies the first element fastest. Never converts strings to factors. Does not add any additional attributes. Returns a tibble, not a data frame. Can expand any generalised vector, including data frames.
a <- c("ABC", "DEF", "GHI")
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
tidyr::expand_grid(a, b)
a b
<chr> <chr>
1 ABC 2012-05-01
2 ABC 2012-05-02
3 ABC 2012-05-03
4 ABC 2012-05-04
5 ABC 2012-05-05
6 DEF 2012-05-01
7 DEF 2012-05-02
8 DEF 2012-05-03
9 DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05
Upvotes: 10
Reputation: 779
The tidyr
package provides the nice alternative crossing
, which works better than the classic expand.grid
function because (1) strings are not converted into factors and (2) the sorting is more intuitive:
library(tidyr)
a <- c("ABC", "DEF", "GHI")
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
crossing(a, b)
# A tibble: 15 x 2
a b
<chr> <chr>
1 ABC 2012-05-01
2 ABC 2012-05-02
3 ABC 2012-05-03
4 ABC 2012-05-04
5 ABC 2012-05-05
6 DEF 2012-05-01
7 DEF 2012-05-02
8 DEF 2012-05-03
9 DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05
Upvotes: 58
Reputation: 51
you can use order function for sorting any number of columns. for your example
df <- expand.grid(a,b)
> df
Var1 Var2
1 ABC 2012-05-01
2 DEF 2012-05-01
3 GHI 2012-05-01
4 ABC 2012-05-02
5 DEF 2012-05-02
6 GHI 2012-05-02
7 ABC 2012-05-03
8 DEF 2012-05-03
9 GHI 2012-05-03
10 ABC 2012-05-04
11 DEF 2012-05-04
12 GHI 2012-05-04
13 ABC 2012-05-05
14 DEF 2012-05-05
15 GHI 2012-05-05
> df[order( df[,1], df[,2] ),]
Var1 Var2
1 ABC 2012-05-01
4 ABC 2012-05-02
7 ABC 2012-05-03
10 ABC 2012-05-04
13 ABC 2012-05-05
2 DEF 2012-05-01
5 DEF 2012-05-02
8 DEF 2012-05-03
11 DEF 2012-05-04
14 DEF 2012-05-05
3 GHI 2012-05-01
6 GHI 2012-05-02
9 GHI 2012-05-03
12 GHI 2012-05-04
15 GHI 2012-05-05`
Upvotes: 4
Reputation: 7475
this maybe what you are after
> expand.grid(a,b)
Var1 Var2
1 ABC 2012-05-01
2 DEF 2012-05-01
3 GHI 2012-05-01
4 ABC 2012-05-02
5 DEF 2012-05-02
6 GHI 2012-05-02
7 ABC 2012-05-03
8 DEF 2012-05-03
9 GHI 2012-05-03
10 ABC 2012-05-04
11 DEF 2012-05-04
12 GHI 2012-05-04
13 ABC 2012-05-05
14 DEF 2012-05-05
15 GHI 2012-05-05
If the resulting order isn't what you want, you can sort afterwards. If you name the arguments to expand.grid
, they will become column names:
df = expand.grid(a = a, b = b)
df[order(df$a), ]
And expand.grid
generalizes to any number of input columns.
Upvotes: 180