Reputation: 10996
Suppose I have the following data:
dat1 <- data.frame(id = c("a", "b", "c", "d"),
x = c(1, 2, 3, 4),
y = rep(NA, 4))
dat2 <- data.frame(id = c("a", "b", "c"),
y = c(9, 8, 7))
dat3 <- data.frame(id = c("d"),
y = c(6))
Now, I want to merge/join the data from dat2
and dat3
to dat1 one after the other in a way that the dat1$y values are replaced by the dat2.y or dat3.y values instead of adding these as new columns.
The problem is that merge
or left_join
would not add the values to the existing y column, but add a y.y column and rename the one from dat1 to y.x.
I also thought I could use the rows_update
function from the tidyverse, but the problem is that in my real life case I'm not only matching by one column (here: id), but by several id columns together, but rows_update only allows the by
variable to be one vector.
NOTE: in my real-life use case I have
The expected output after merging dat2 and dat3 to dat1 would be:
id x y
"a" 1 9
"b" 2 8
"c" 3 7
"d" 4 6
Upvotes: 2
Views: 1660
Reputation: 388982
You can get dataframes in a list and left_join
them using reduce
. If every row has only one y
value we can use rowSums
/rowMeans
ignoring NA
value.
library(dplyr)
mget(paste0('dat', 1:3)) %>%
purrr::reduce(left_join, by = 'id') %>%
mutate(y = rowSums(select(., starts_with('y')), na.rm = TRUE)) %>%
select(id, x, y)
# id x y
#1 a 1 9
#2 b 2 8
#3 c 3 7
#4 d 4 6
Upvotes: 2
Reputation: 4456
A very simple answer - but maybe not too generalizable - would be:
dat1$y = c(dat2$y, dat3$y)
With a loop, to do this to several data frames:
newy = numeric()
for(i in 2:ndf){ # Where "ndf" is the number of data frames you have
newy = c(newy, eval(parse(text=paste("dat",i,"$y",sep=""))))}
OBS: evaluating objects by strings, with the eval(parse(text=...))
normaly isn't the best way to do it in R. It is probably best if the data frames were created together in a list (as listing them now would be very manual, atleast with my knowledge), and the loop would be:
newy = numeric()
for(i in 2:ndf){
newy = c(newy, df.list[[i]]$y)}
Upvotes: 1
Reputation: 39595
Try with indexing using %in%
to test the id variables:
#Data
dat1 <- data.frame(id = c("a", "b", "c", "d"),
x = c(1, 2, 3, 4),
y = rep(NA, 4))
dat2 <- data.frame(id = c("a", "b", "c"),
y = c(9, 8, 7))
dat3 <- data.frame(id = c("d"),
y = c(6))
#Code
dat1$y[dat1$id %in% dat2$id] <- dat2$y[dat2$id %in% dat1$id]
dat1$y[dat1$id %in% dat3$id] <- dat3$y[dat3$id %in% dat1$id]
Output:
id x y
1 a 1 9
2 b 2 8
3 c 3 7
4 d 4 6
You can use a loop with a list to store the objects from dat2
to datn
and then make the assignation of values:
#Data
dat1 <- data.frame(id = c("a", "b", "c", "d"),
x = c(1, 2, 3, 4),
y = rep(NA, 4))
dat2 <- data.frame(id = c("a", "b", "c"),
y = c(9, 8, 7))
dat3 <- data.frame(id = c("d"),
y = c(6))
#Store Objects in a list
List <- list(dat2,dat3)
#Loop
for(i in 1:length(List))
{
#Data
df <- List[[i]]
#Assign
dat1$y[dat1$id %in% df$id] <- df$y[df$id %in% dat1$id]
}
Output:
dat1
id x y
1 a 1 9
2 b 2 8
3 c 3 7
4 d 4 6
Upvotes: 2