Reputation: 934
I'm looking for an efficient way to build on a dataframe with a number of conditional combinations. The process I'm trying to make into some sort of a "create-df" flexible function works like this:
# STEP 1
level.1 <- c("Netherlands", "New Zealand", "Europe")
# STEP 2:
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "Paris")
df <- data.frame(level.1, level.2)
df
# STEP 3:
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "London", "Paris", "Paris")
level.3 <- c(NA, NA, "City", "Roads", "City", "Roads")
data.frame(level.1, level.2, level.3)
# STEP 4:
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "London", "London", "London", "Paris", "Paris")
level.3 <- c(NA, NA, "City", "City", "Roads", "Roads", "City", "Roads")
level.4 <- c(NA, NA, "A-Regulated", "G-Regulated", "A-Regulated", "G-Regulated", NA, NA)
data.frame(level.1, level.2, level.3, level.4)
The part that I'd like to be flexible is that if I added a new variable to these columns, it would know which rule to follow. So if I added a new level to level 4 and say it was called "C-Regulated" which will correspond to the London group, it would create an extra two rows for level 3 to have each combination with C-Regulated, as shown below.
level.1 <- c("Netherlands", "New Zealand", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe")
level.2 <- c(NA, NA, "London", "London", "London", "London", "London", "London", "Paris", "Paris")
level.3 <- c(NA, NA, "City", "City", "City", "Roads", "Roads", "Roads", "City", "Roads")
level.4 <- c(NA, NA, "A-Regulated", "G-Regulated", "C-Regulated", "A-Regulated", "G-Regulated", "C-Regulated", NA, NA)
data.frame(level.1, level.2, level.3, level.4)
I've been playing with having this in a for loop and using expand.grid() to create the dataframes, but I keep getting stuck with the mindset of if-else statements, and don't want to just create a long inefficient code. Any advice?
Upvotes: 1
Views: 53
Reputation: 24945
You can do this by using a merge, and reduce to loop over it.
At each level you will need to specify only the values you want to join on:
level.1 <- data.frame(country = c("Netherlands", "New Zealand", "Europe"))
level.2 <- data.frame(country = c("Europe"), city = c('paris','london' ))
level.3 <- data.frame(country = c('Europe'), location = c('city', 'roads'))
level.4 <- data.frame(country = c('Europe'), regulation = c("A-Regulated", "G-Regulated", "C-Regulated"))
Then we use a merge, inside a Reduce to do it recursively:
Reduce(function(x, y) merge(x, y, all=TRUE), list(level.1, level.2, level.3, level.4))
country city location regulation
1 Europe paris city A-Regulated
2 Europe paris city G-Regulated
3 Europe paris city C-Regulated
4 Europe paris roads A-Regulated
5 Europe paris roads G-Regulated
6 Europe paris roads C-Regulated
7 Europe london city A-Regulated
8 Europe london city G-Regulated
9 Europe london city C-Regulated
10 Europe london roads A-Regulated
11 Europe london roads G-Regulated
12 Europe london roads C-Regulated
13 Netherlands <NA> <NA> <NA>
14 New Zealand <NA> <NA> <NA>
The Reduce is a little tricky: it is the same as writing out each merge in a nested fashion:
merge(merge(merge(level.1, level.2, all = TRUE), level.3, all = TRUE), level.4, all = TRUE)
Upvotes: 1