Reputation: 10996

R Add values to existing column in data frame by merging with other data frame

Suppose I have the following data:

dat1 <- data.frame(id = c("a", "b", "c", "d"),
                   x  = c(1, 2, 3, 4),
                   y  = rep(NA, 4))

dat2 <- data.frame(id = c("a", "b", "c"),
                   y  = c(9, 8, 7))


dat3 <- data.frame(id = c("d"),
                   y  = c(6))

Now, I want to merge/join the data from dat2 and dat3 to dat1 one after the other in a way that the dat1$y values are replaced by the dat2.y or dat3.y values instead of adding these as new columns.

The problem is that merge or left_join would not add the values to the existing y column, but add a y.y column and rename the one from dat1 to y.x.

I also thought I could use the rows_update function from the tidyverse, but the problem is that in my real life case I'm not only matching by one column (here: id), but by several id columns together, but rows_update only allows the by variable to be one vector.

NOTE: in my real-life use case I have

~50 data frames to merge
the uniqueness of my rows can only be determined through multiple id columns
the id columns have different names in my dat1 and all other dat2 to dat50 data frames.

The expected output after merging dat2 and dat3 to dat1 would be:

id    x    y
"a"   1    9
"b"   2    8
"c"   3    7
"d"   4    6

Upvotes: 2

Answers (3)

Ronak Shah

Reputation: 388982

You can get dataframes in a list and left_join them using reduce. If every row has only one y value we can use rowSums/rowMeans ignoring NA value.

library(dplyr)

mget(paste0('dat', 1:3)) %>%
  purrr::reduce(left_join, by = 'id') %>%
  mutate(y = rowSums(select(., starts_with('y')), na.rm = TRUE)) %>%
  select(id, x, y)

#  id x y
#1  a 1 9
#2  b 2 8
#3  c 3 7
#4  d 4 6

Upvotes: 2

Ricardo Semião

Reputation: 4456

A very simple answer - but maybe not too generalizable - would be:

dat1$y = c(dat2$y, dat3$y)

With a loop, to do this to several data frames:

newy = numeric()
for(i in 2:ndf){ # Where "ndf" is the number of data frames you have
    newy = c(newy, eval(parse(text=paste("dat",i,"$y",sep=""))))}

OBS: evaluating objects by strings, with the eval(parse(text=...)) normaly isn't the best way to do it in R. It is probably best if the data frames were created together in a list (as listing them now would be very manual, atleast with my knowledge), and the loop would be:

newy = numeric()
for(i in 2:ndf){
    newy = c(newy, df.list[[i]]$y)}

Upvotes: 1

Duck

Reputation: 39595

Try with indexing using %in% to test the id variables:

#Data
dat1 <- data.frame(id = c("a", "b", "c", "d"),
                   x  = c(1, 2, 3, 4),
                   y  = rep(NA, 4))

dat2 <- data.frame(id = c("a", "b", "c"),
                   y  = c(9, 8, 7))


dat3 <- data.frame(id = c("d"),
                   y  = c(6))
#Code
dat1$y[dat1$id %in% dat2$id] <- dat2$y[dat2$id %in% dat1$id]
dat1$y[dat1$id %in% dat3$id] <- dat3$y[dat3$id %in% dat1$id]

Output:

You can use a loop with a list to store the objects from dat2 to datn and then make the assignation of values:

#Data
dat1 <- data.frame(id = c("a", "b", "c", "d"),
                   x  = c(1, 2, 3, 4),
                   y  = rep(NA, 4))

dat2 <- data.frame(id = c("a", "b", "c"),
                   y  = c(9, 8, 7))


dat3 <- data.frame(id = c("d"),
                   y  = c(6))
#Store Objects in a list
List <- list(dat2,dat3)
#Loop
for(i in 1:length(List))
{
  #Data
  df <- List[[i]]
  #Assign
  dat1$y[dat1$id %in% df$id] <- df$y[df$id %in% dat1$id]
}

Output:

dat1
  id x y
1  a 1 9
2  b 2 8
3  c 3 7
4  d 4 6

Upvotes: 2

R Add values to existing column in data frame by merging with other data frame

Answers (3)

Related Questions