S31
S31

Reputation: 934

R: Building Conditional Combinations

I'm looking for an efficient way to build on a dataframe with a number of conditional combinations. The process I'm trying to make into some sort of a "create-df" flexible function works like this:

# STEP 1
level.1 <- c("Netherlands", "New Zealand", "Europe")

# STEP 2: 
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "Paris")
df <- data.frame(level.1, level.2)
df

# STEP 3:
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "London", "Paris", "Paris")
level.3 <- c(NA, NA, "City", "Roads", "City", "Roads")
data.frame(level.1, level.2, level.3)

# STEP 4:
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "London", "London", "London", "Paris", "Paris")
level.3 <- c(NA, NA, "City", "City", "Roads", "Roads", "City", "Roads")
level.4 <- c(NA, NA, "A-Regulated", "G-Regulated", "A-Regulated", "G-Regulated", NA, NA)
data.frame(level.1, level.2, level.3, level.4)

The part that I'd like to be flexible is that if I added a new variable to these columns, it would know which rule to follow. So if I added a new level to level 4 and say it was called "C-Regulated" which will correspond to the London group, it would create an extra two rows for level 3 to have each combination with C-Regulated, as shown below.

level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "London", "London", "London", "London", "London", "Paris", "Paris")
level.3 <- c(NA, NA, "City", "City", "City", "Roads", "Roads", "Roads", "City", "Roads")
level.4 <- c(NA, NA, "A-Regulated", "G-Regulated", "C-Regulated", "A-Regulated", "G-Regulated", "C-Regulated", NA, NA)
data.frame(level.1, level.2, level.3, level.4)

I've been playing with having this in a for loop and using expand.grid() to create the dataframes, but I keep getting stuck with the mindset of if-else statements, and don't want to just create a long inefficient code. Any advice?

Upvotes: 1

Views: 53

Answers (1)

jeremycg
jeremycg

Reputation: 24945

You can do this by using a merge, and reduce to loop over it.

At each level you will need to specify only the values you want to join on:

level.1 <- data.frame(country = c("Netherlands", "New Zealand", "Europe"))
level.2 <- data.frame(country = c("Europe"), city = c('paris','london' ))
level.3 <- data.frame(country = c('Europe'), location = c('city', 'roads'))
level.4 <- data.frame(country = c('Europe'), regulation = c("A-Regulated", "G-Regulated", "C-Regulated"))

Then we use a merge, inside a Reduce to do it recursively:

Reduce(function(x, y) merge(x, y, all=TRUE), list(level.1, level.2, level.3, level.4))

       country   city location  regulation
1       Europe  paris     city A-Regulated
2       Europe  paris     city G-Regulated
3       Europe  paris     city C-Regulated
4       Europe  paris    roads A-Regulated
5       Europe  paris    roads G-Regulated
6       Europe  paris    roads C-Regulated
7       Europe london     city A-Regulated
8       Europe london     city G-Regulated
9       Europe london     city C-Regulated
10      Europe london    roads A-Regulated
11      Europe london    roads G-Regulated
12      Europe london    roads C-Regulated
13 Netherlands   <NA>     <NA>        <NA>
14 New Zealand   <NA>     <NA>        <NA>

The Reduce is a little tricky: it is the same as writing out each merge in a nested fashion:

merge(merge(merge(level.1, level.2, all = TRUE), level.3, all = TRUE), level.4, all = TRUE)

Upvotes: 1

Related Questions