RB78
RB78

Reputation: 13

R - expand.grid only within levels of one variable

I have a dataset similar to:

Site Sample Date:

A   A1 2016-09-01 
A   A1 2016-09-21 
A   A2 2016-09-15 
A   A2 2016-09-21 
B   B1 2016-09-03 
B   B2 2016-09-12 

What I would like to do is expand.grid, but only within each level of df$Site to acheive this:

Site Sample Date:

A   A1  2016-09-01
A   A1  2016-09-15
A   A1  2016-09-21
A   A2  2016-09-01
A   A2  2016-09-15
A   A2  2016-09-21
B   B1  2016-09-03
B   B1  2016-09-12
B   B2  2016-09-03
B   B2  2016-09-12

But I don't know how to specify that with expand.grid so I don't end up with:

Site Sample Date:

A   A1  2016-09-01
A   A1  2016-09-03
A   A1  2016-09-12
A   A1  2016-09-15
A   A1  2016-09-21
A   A2  2016-09-01
A   A2  2016-09-03
A   A2  2016-09-12
A   A2  2016-09-15
A   A2  2016-09-21
B   B1  2016-09-01
B   B1  2016-09-03
B   B1  2016-09-12
B   B1  2016-09-15
B   B1  2016-09-21
B   B2  2016-09-01
B   B2  2016-09-03
B   B2  2016-09-12
B   B2  2016-09-15
B   B2  2016-09-21

I hope this is clear, I couldn't figure out how to format these tables very well!

Upvotes: 1

Views: 1109

Answers (2)

akrun
akrun

Reputation: 886948

We can do this after grouping by 'Site' with `dplyr/tidyr'

library(dplyr)
library(tidyr)
df1 %>%
   group_by(Site) %>%
   expand(Sample, Date)
#    Site Sample       Date
#   <chr>  <chr>      <chr>
#1      A     A1 2016-09-01
#2      A     A1 2016-09-15
#3      A     A1 2016-09-21
#4      A     A2 2016-09-01
#5      A     A2 2016-09-15
#6      A     A2 2016-09-21
#7      B     B1 2016-09-03
#8      B     B1 2016-09-12
#9      B     B2 2016-09-03
#10     B     B2 2016-09-12

Or using data.table

library(data.table)
setDT(df1)[, do.call(CJ, lapply(.SD, unique)) , by = Site]
#    Site Sample       Date
# 1:    A     A1 2016-09-01
# 2:    A     A1 2016-09-15
# 3:    A     A1 2016-09-21
# 4:    A     A2 2016-09-01
# 5:    A     A2 2016-09-15
# 6:    A     A2 2016-09-21
# 7:    B     B1 2016-09-03
# 8:    B     B1 2016-09-12
# 9:    B     B2 2016-09-03
#10:    B     B2 2016-09-12

Or we can use a base R solution

do.call(rbind, lapply(split(df1[-1], df1$Site), 
         function(x) expand.grid(lapply(x, unique))))
#   Sample       Date
#A.1     A1 2016-09-01
#A.2     A2 2016-09-01
#A.3     A1 2016-09-21
#A.4     A2 2016-09-21
#A.5     A1 2016-09-15
#A.6     A2 2016-09-15
#B.1     B1 2016-09-03
#B.2     B2 2016-09-03
#B.3     B1 2016-09-12
#B.4     B2 2016-09-12

data

df1 <- structure(list(Site = c("A", "A", "A", "A", "B", "B"), Sample = c("A1", 
"A1", "A2", "A2", "B1", "B2"), Date = c("2016-09-01", "2016-09-21", 
"2016-09-15", "2016-09-21", "2016-09-03", "2016-09-12")), .Names = c("Site", 
"Sample", "Date"), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 2

lmo
lmo

Reputation: 38500

Here is a base R solution. You can feed expand.grid unique vectors like this

do.call(rbind, lapply(split(df, df$Site),
               function(i) with(i, expand.grid(unique(Site), unique(Sample), unique(Date)))))

Var1 Var2       Var3
A.1    A   A1 2016-09-01
A.2    A   A2 2016-09-01
A.3    A   A1 2016-09-21
A.4    A   A2 2016-09-21
A.5    A   A1 2016-09-15
A.6    A   A2 2016-09-15
B.1    B   B1 2016-09-03
B.2    B   B2 2016-09-03
B.3    B   B1 2016-09-12
B.4    B   B2 2016-09-12

or use unique on each expanded data.frame.

do.call(rbind, lapply(split(df, df$Site),
                     function(i) with(i, unique(expand.grid(Site, Sample, Date)))))
     Var1 Var2       Var3
A.1     A   A1 2016-09-01
A.9     A   A2 2016-09-01
A.17    A   A1 2016-09-21
A.25    A   A2 2016-09-21
A.33    A   A1 2016-09-15
A.41    A   A2 2016-09-15
B.1     B   B1 2016-09-03
B.3     B   B2 2016-09-03
B.5     B   B1 2016-09-12
B.7     B   B2 2016-09-12

Upvotes: 0

Related Questions