deschen
deschen

Reputation: 10996

R Add values to existing column in data frame by merging with other data frame

Suppose I have the following data:

dat1 <- data.frame(id = c("a", "b", "c", "d"),
                   x  = c(1, 2, 3, 4),
                   y  = rep(NA, 4))

dat2 <- data.frame(id = c("a", "b", "c"),
                   y  = c(9, 8, 7))


dat3 <- data.frame(id = c("d"),
                   y  = c(6))

Now, I want to merge/join the data from dat2 and dat3 to dat1 one after the other in a way that the dat1$y values are replaced by the dat2.y or dat3.y values instead of adding these as new columns.

The problem is that merge or left_join would not add the values to the existing y column, but add a y.y column and rename the one from dat1 to y.x.

I also thought I could use the rows_update function from the tidyverse, but the problem is that in my real life case I'm not only matching by one column (here: id), but by several id columns together, but rows_update only allows the by variable to be one vector.

NOTE: in my real-life use case I have

The expected output after merging dat2 and dat3 to dat1 would be:

id    x    y
"a"   1    9
"b"   2    8
"c"   3    7
"d"   4    6

Upvotes: 2

Views: 1660

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

You can get dataframes in a list and left_join them using reduce. If every row has only one y value we can use rowSums/rowMeans ignoring NA value.

library(dplyr)

mget(paste0('dat', 1:3)) %>%
  purrr::reduce(left_join, by = 'id') %>%
  mutate(y = rowSums(select(., starts_with('y')), na.rm = TRUE)) %>%
  select(id, x, y)

#  id x y
#1  a 1 9
#2  b 2 8
#3  c 3 7
#4  d 4 6

Upvotes: 2

Ricardo Semi&#227;o
Ricardo Semi&#227;o

Reputation: 4456

A very simple answer - but maybe not too generalizable - would be:

dat1$y = c(dat2$y, dat3$y)

With a loop, to do this to several data frames:

newy = numeric()
for(i in 2:ndf){ # Where "ndf" is the number of data frames you have
    newy = c(newy, eval(parse(text=paste("dat",i,"$y",sep=""))))}

OBS: evaluating objects by strings, with the eval(parse(text=...)) normaly isn't the best way to do it in R. It is probably best if the data frames were created together in a list (as listing them now would be very manual, atleast with my knowledge), and the loop would be:

newy = numeric()
for(i in 2:ndf){
    newy = c(newy, df.list[[i]]$y)}

Upvotes: 1

Duck
Duck

Reputation: 39595

Try with indexing using %in% to test the id variables:

#Data
dat1 <- data.frame(id = c("a", "b", "c", "d"),
                   x  = c(1, 2, 3, 4),
                   y  = rep(NA, 4))

dat2 <- data.frame(id = c("a", "b", "c"),
                   y  = c(9, 8, 7))


dat3 <- data.frame(id = c("d"),
                   y  = c(6))
#Code
dat1$y[dat1$id %in% dat2$id] <- dat2$y[dat2$id %in% dat1$id]
dat1$y[dat1$id %in% dat3$id] <- dat3$y[dat3$id %in% dat1$id]

Output:

  id x y
1  a 1 9
2  b 2 8
3  c 3 7
4  d 4 6

You can use a loop with a list to store the objects from dat2 to datn and then make the assignation of values:

#Data
dat1 <- data.frame(id = c("a", "b", "c", "d"),
                   x  = c(1, 2, 3, 4),
                   y  = rep(NA, 4))

dat2 <- data.frame(id = c("a", "b", "c"),
                   y  = c(9, 8, 7))


dat3 <- data.frame(id = c("d"),
                   y  = c(6))
#Store Objects in a list
List <- list(dat2,dat3)
#Loop
for(i in 1:length(List))
{
  #Data
  df <- List[[i]]
  #Assign
  dat1$y[dat1$id %in% df$id] <- df$y[df$id %in% dat1$id]
}

Output:

dat1
  id x y
1  a 1 9
2  b 2 8
3  c 3 7
4  d 4 6

Upvotes: 2

Related Questions