Matt_at_avid
Matt_at_avid

Reputation: 23

How to replace NA values by matching on ID, with two different data frames

I have two data frames, and I want to transfer the latitude and longitude from one df to another. DF1 is missing most of the lat and long, but DF2 also has a few missing values in the lat and long tool. I want to match on their location id (loc_id), but not all rows in DF1 have a loc_id.

reg_id <- c(1,2,3,4,5,6,7)
loc_id <- c(NA,1,1,NA,2,3,4)
lat <- c(NA,12.56,12.56,NA,NA,13.757,NA)
long <- c(NA,-31.2,-31.2,NA,NA,-56.43,NA)
DF1 <- data.frame(reg_id,loc_id,lat,long)

loc_id <- c(1,2,3,4,5,60)
loc_lat <- c(12.56,20.7,13.757,NA,20.3,50.7)
loc_long <- c(-31.2,-47.3,-56.43,NA,-71.123,-27.1)
DF2 <- data.frame(loc_id,loc_lat,loc_long)

Goal is to end up with a df like this:

reg_id   loc_id   lat       long
  1        NA      NA        NA
  2        1      12.56     -31.2
  3        1      12.56     -31
  4        NA      NA        NA
  5        2      20.7      -47.3
  6        3      13.757   -56.43
  7        4       NA        NA

Note: The lat and long are not the same lengths for each row.

Thank you for your help!

Just some background, I've tried match, setDT, merge, to no avail.

no_ll_DF1 <- which(is.na(DF1$lat) & !is.na(DF1$loc_id))
yes_ll_DF2 <- which(!is.na(DF2$loc_lat))

DF1$lat[match(DF2$loc_id[yes_ll_DF2], DF1$loc_id[no_ll_DF1])] <- DF2$loc_lat[yes_ll_DF2]

When using match, I would get this error:

Error in DF1$lat[match(DF2$loc_id[yes_ll_DF2], DF1$loc_id[no_ll_DF1])] <- DF2$loc_lat[yes_ll_DF2] : NAs are not allowed in subscripted assignments

Upvotes: 2

Views: 825

Answers (1)

fmarm
fmarm

Reputation: 4284

One way to do this using the tidyverse package

library(tidyverse)
# left join to keep all DF1 and add only matching loc_id in DF2
DF1 %>% left_join(DF2,by="loc_id") %>%
        # replace missing lat and long if we have a match in DF2
        mutate(lat=ifelse(is.na(lat),loc_lat,lat),
               long=ifelse(is.na(long),loc_long,long)) %>%
        # remove loc_lat and loc_long columns from dataframe
        select(-loc_lat,-loc_long)

Upvotes: 1

Related Questions