Reputation: 23797
I am aware that merging is a widely covered topic. If you think this is a duplicate, I am very happy to be put onto the question that answers my question, but I haven't found it (Sorry!). Thanks
I have two data frames:
require(dplyr)
set.seed(1)
large_df <- data_frame(id = rep(paste0('id',1:40), each = 3),
age = c(rep(NA,60),rep (sample(20), each = 3)),
col3 = rep(letters[1:20],6), col4 = rep(1:60,2))
small_df <- data_frame(id = paste0('id',1:20),
age = sample(20))
large_df
contains incomplete data (large_df$age
), which is contained in small_df
. Now I would like to bring the information from small_df$age
into large_df$age
(merged by the correct 'id'). I think this must be possible via merge or one of the join functions from dplyr, but several combinations did not bring the result I would like.
I also tried a for loop over the rows:
for(i in nrow(large_df)) {
if (large_df[i,'id'] %in% small_df$id == TRUE) {
large_df[i,'age'] <- small_df$age[which(small_df$id %in% large_df[i,'id'])]
}
}
But this doesnt help, it doesn't even return any result. (Anyone an idea why not?)
My result would look like that:
large_df$age[1:60] <- rep(small_df$age, each = 3)
large_df
# A tibble: 120 x 4
id age col3 col4
<chr> <int> <chr> <int>
1 id1 6 a 1
2 id1 6 b 2
3 id1 6 c 3
4 id2 8 d 4
5 id2 8 e 5
6 id2 8 f 6
7 id3 11 g 7
8 id3 11 h 8
9 id3 11 i 9
10 id4 16 j 10
# ... with 110 more rows
Upvotes: 1
Views: 357
Reputation: 6567
Using your data frames this would do the trick.
result =
large_df %>%
left_join(small_df, by = 'id') %>%
mutate(age = ifelse(is.na(age.x), age.y, age.x)) %>%
dplyr::select(-age.x, -age.y)
result
# A tibble: 120 x 4
id col3 col4 age
<chr> <chr> <int> <int>
1 id1 a 1 19
2 id1 b 2 19
3 id1 c 3 19
4 id2 d 4 5
If both age.x
and age.y
are NA
then NA
would be output in age
.
Upvotes: 3