Reputation: 107
I am trying to append rows to an R data.frame. Here is an example of a data.frame "foo":
A B C D
1 1 1 200
1 1 2 50
1 1 3 15
1 2 1 150
1 2 4 50
1 3 1 300
2 1 2 40
2 1 4 90
2 3 2 80
For every A, there are 3 possible values of B, and for every B, there are 4 possible values of C. However, the initial df only contains non-zero values of D. I'd like to manipulate the df so that zeros are included for both B and C. Thus, the df would show 0's in D for any value of B/C that was 0. I have seen questions that address this with one column, but couldn't find a question addressing it with multiple columns. The final df would look like this:
A B C D
1 1 1 200
1 1 2 50
1 1 3 15
1 1 4 0
1 2 1 150
1 2 2 0
1 2 3 0
1 2 4 50
1 3 1 300
1 3 2 0
1 3 3 0
1 3 4 0
2 1 1 0
2 1 2 40
2 1 3 0
2 1 4 90
2 2 1 0
2 2 2 0
2 2 3 0
2 2 4 0
2 3 1 0
2 3 2 80
2 3 3 0
2 3 4 0
I first tried creating a dummy data frame that then merged with the initial df, but something isn't working right. Here's the current code, which I know is wrong because this code only generates rows based on A. I think I want to make the dummy frame based on A and B but I don't know how - could an if/else function work here?:
# create dummy df
dummy <- as.data.frame(
cbind(
sort(rep(unique(foo$A), 12)),
rep(1:3,length(unique(foo$A)))))
colnames(dummy) <- c("A","B")
foo$A <- as.numeric(foo$A)
foo$B <- as.numeric(foo$C)
# merge with foo
mergedummy <- merge(dummy,foo,all.x=T)
Any insight is greatly appreciated - thanks!
Upvotes: 1
Views: 2023
Reputation: 93813
A one liner:
merge(dat, data.frame(table(dat[1:3]))[-4],all.y=TRUE)
# A B C D
#1 1 1 1 200
#2 1 1 2 50
#3 1 1 3 15
#4 1 1 4 NA
#...
Or maybe less complicated:
out <- data.frame(xtabs(D ~ ., data=dat))
out[do.call(order,out[1:3]),]
# A B C Freq
#1 1 1 1 200
#7 1 1 2 50
#13 1 1 3 15
#19 1 1 4 0
#...
Where dat
is:
dat <- structure(list(A = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(1L,
1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L), C = c(1L, 2L, 3L, 1L, 4L, 1L,
2L, 4L, 2L), D = c(200L, 50L, 15L, 150L, 50L, 300L, 40L, 90L,
80L)), .Names = c("A", "B", "C", "D"), class = "data.frame", row.names = c(NA,
-9L))
Upvotes: 6
Reputation: 23574
I created a master data frame which includes all combinations of A, B, and C as you describe in the expected outcome. Then, I merge the master data frame and your data frame. Finally, I replaced NA with 0.
master <- data.frame(A = rep(1:2, each = 12),
B = rep(1:3, each = 4),
C = rep(1:4, times = 6))
library(dplyr)
master %>%
left_join(., mydf) %>%
mutate(D = ifelse(D %in% NA, 0, D))
# A B C D
#1 1 1 1 200
#2 1 1 2 50
#3 1 1 3 15
#4 1 1 4 0
#5 1 2 1 150
#6 1 2 2 0
#7 1 2 3 0
#8 1 2 4 50
#9 1 3 1 300
#10 1 3 2 0
#11 1 3 3 0
#12 1 3 4 0
#13 2 1 1 0
#14 2 1 2 40
#15 2 1 3 0
#16 2 1 4 90
#17 2 2 1 0
#18 2 2 2 0
#19 2 2 3 0
#20 2 2 4 0
#21 2 3 1 0
#22 2 3 2 80
#23 2 3 3 0
#24 2 3 4 0
Upvotes: 3
Reputation: 9656
Here is one solution:
foo <- merge(expand.grid(lapply(foo[,1:3], unique)), foo, all=TRUE, sort=TRUE)
foo[is.na(foo)] <- 0
Upvotes: 2