Kak Schoen
Kak Schoen

Reputation: 384

Fill in missing values in column with a different dataframe

I have an example dataframe that has a column which stores 3 letters of the alphabet per row. The dataframe also has 2 additional columns: date and colour:

Alphabet       Date   Colour
  ABC    2018-09-10   green
  DEF    2017-06-11   red
  GHI    2016-05-12   blue
  JKL            NA   yellow
  MNO            NA   orange
  PQR       Unknown   brown

Some of the dates are missing / unknown in this dataframe. I have another dataframe which also has a alphabet and a date column. This second dataframe includes the dates for the missing dates in the first dataframe:

Alphabet       Date   
  JKL    2017-06-07  
  MNO    2018-08-03   
  PQR    2019-10-07
  STU    2019-11-08
  VWX    2019-12-08   

I want to fill in the missing dates from the first dataframe by matching the Alphabet records from both dataframes and then insert the dates from the second dataframe into the first dataframe.

Desired output:

Alphabet       Date   Colour
  ABC    2018-09-10   green
  DEF    2017-06-11   red
  GHI    2016-05-12   blue
  JKL    2017-06-07   yellow
  MNO    2018-08-03   orange
  PQR    2019-10-07   brown

Any help is appreciated.

Upvotes: 2

Views: 230

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389235

Using dplyr we can left_join df1 and df2 and then use coalesce to fill in missing values.

library(dplyr)

left_join(df1, df2, by = "Alphabet") %>%
   mutate(Date = coalesce(Date.y, Date.x)) %>%
  select(-Date.x, -Date.y)

#  Alphabet Colour       Date
#1      ABC  green 2018-09-10
#2      DEF    red 2017-06-11
#3      GHI   blue 2016-05-12
#4      JKL yellow 2017-06-07
#5      MNO orange 2018-08-03
#6      PQR  brown 2019-10-07

Upvotes: 1

akrun
akrun

Reputation: 887851

One option is a join with data.table

library(data.table)
setDT(df1)[df2, Date := i.Date, on = .(Alphabet)]
df1
#   Alphabet       Date Colour
#1:      ABC 2018-09-10  green
#2:      DEF 2017-06-11    red
#3:      GHI 2016-05-12   blue
#4:      JKL 2017-06-07 yellow
#5:      MNO 2018-08-03 orange
#6:      PQR 2019-10-07  brown

Update

Using the new 'df2n' dataset

i1 <- is.na(df1$Date)|df1$Date %in% "Unknown"
setDT(df1)[df2n[df2n$Alphabet %in% df1$Alphabet[i1],],
         Date := i.Date, on = .(Alphabet)]
df1
#   Alphabet       Date Colour
#1:      ABC 2018-09-10  green
#2:      DEF 2017-06-11    red
#3:      GHI 2016-05-12   blue
#4:      JKL 2017-06-07 yellow
#5:      MNO 2018-08-03 orange
#6:      PQR 2019-10-07  brown

Or using match from base R

i1 <- match(df2$Alphabet, df1$Alphabet)
df1$Date[i1] <- df2$Date

data

df1 <- structure(list(Alphabet = c("ABC", "DEF", "GHI", "JKL", "MNO", 
"PQR"), Date = c("2018-09-10", "2017-06-11", "2016-05-12", NA, 
NA, "Unknown"), Colour = c("green", "red", "blue", "yellow", 
"orange", "brown")), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(Alphabet = c("JKL", "MNO", "PQR"), Date = c("2017-06-07", 
"2018-08-03", "2019-10-07")), class = "data.frame", row.names = c(NA, 
-3L))

df2a  <- structure(list(Alphabet = c("JKL", "MNO", "PQR", "STU", "VWX"
), Date = c("2017-06-07", "2018-08-03", "2019-10-07", "2019-11-08", 
"2019-12-08")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions