Marine Régis
Marine Régis

Reputation: 531

How to fill a column from a data frame based on another data frame using dplyr

I have two data frames and I am trying to replace NAs in a column of the second data frame using the values in a column of the first data frame. I would like to do this using the dplyr package and I am not familiar with this package:

Here is a reproducible example:

library(dplyr)
## Create the two data frames
      dt1 <- data.frame(ID = c(rep(1, 6), rep(2, 6), rep(3, 6)), day = c(seq(0, 5, by= 1), seq(0, 5, by= 1), seq(0, 5, by= 1)), density = sample(1:100, 6*3))
      dt2 <- data.frame(ID = c(rep(1, 6), rep(2, 6), rep(3, 6)), day = c(seq(0, 5, by= 1), seq(0, 5, by= 1), seq(0, 5, by= 1)), density = NA)

## Fill the second data frame
     dt2[dt2$day == 0, c("density")] <- c(1, 2, 8)
     dt2[dt2$day %in% c(1, 2, 3, 4, 5), c("density")] <- dt1[dt1$day %in% c(0, 1, 2, 3, 4), c("density")] 
## the values in the column "ID" of dt1 must be equivalent to the values in the column "ID" of dt2

How can I reproduce the two last commands using the dplyr package ?

Here is my test:

  dt2_fill <- dt2 %>% 
    mutate(density = if(day == 0){c(1, 2, 8)},
           density = if(day %in% c(1, 2, 3, 4, 5)){dt1[dt1$day %in% c(0, 1, 2, 3, 4), c("density")]})

But this code doesn't work.

Upvotes: 1

Views: 1088

Answers (2)

benc
benc

Reputation: 376

It looks like what you're trying to do here, at least in practice, is merge two data frames. Your ID and day variables work as unique identifiers, except that the day variable in dt1 is off by one in dt2. So what about a solution like this:

dt2 <- dt1 %>% 
  mutate(day = day + 1) %>% # Adjust "day" variable to line up with the "day" variable in dt2
  right_join(dt2 %>% select(-density), by = c("ID", "day"))

That will leave NAs in the density variable for cases where day == 0. You could solve that using the filter/bind-rows solution that Ronak suggests above, or you could assign those cases using ifelse statements like so:

dt2 <- dt2 %>% 
  mutate(density = ifelse(day > 0, density,
                          ifelse(ID == 1, 1,
                                 ifelse(ID == 2, 2, 8))))

(This is a bit kludge-y, and I suspect there might be a better solution in your real-world case if you want to provide more details.)

Another option is just to create your dt2 data frame directly from dt1:

dt2 <- dt1 %>% 
  mutate(day = day + 1) %>% 
  filter(day < 6) %>% 
  bind_rows(tibble(ID = c(1,2,3), day = 0, density = c(1,2,8))) %>% 
  arrange(ID, day)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

This might not be an ideal solution but gives expected output

A complete dplyr solution

library(dplyr)
dt2 %>%
  filter(day == 0) %>%
  mutate(density = c(1, 2, 8)) %>%
  bind_rows(dt2 %>%
              filter(day %in% c(1, 2, 3, 4, 5)) %>%
              mutate(density = dt1 %>%
                              filter(day %in% c(0, 1, 2, 3, 4)) %>% 
                               pull(density)
             ))

#   ID day density
#1   1   0       1
#2   2   0       2
#3   3   0       8
#4   1   1      84
#5   1   2      72
#6   1   3       4
#7   1   4      31
#....

We first filter the rows for day == 0 and assign the values c(1, 2, 8) to them. For remaining rows we get the corresponding density column from dt1.


We can reduce a bit of complexity by

dt2 %>%
   filter(day == 0) %>%
   mutate(density = c(1, 2, 8)) %>%
   bind_rows(dt2 %>%
              filter(day %in% c(1, 2, 3, 4, 5)) %>%
              mutate(density = dt1$density[dt1$day %in% c(0, 1, 2, 3, 4)])
          )

Upvotes: 2

Related Questions