Earthshaker
Earthshaker

Reputation: 599

Replacing NA values from another dataframe by id

I have a dataframe df1 as below -

ID Count 
11  345
22  456
33  786
44  765
55  NA
66  888
77  NA

Then i have a df2 as below -

 ID  Count 
 11  536
 22  654
 33  786
 44  999
 55  890
 66  111
 77  654

I want NA values in df1 to be replaced by values from df2 for those specific id.

The resulting df3 should be:

ID  Count 
11  345
22  456
33  786
44  765
55  890
66  888
77  654

Any help will be great Thanks

Upvotes: 6

Views: 5952

Answers (4)

jemand-r
jemand-r

Reputation: 11

To do that for multiple columns and if NAs happen in all these columns of specific rows, you can simply filter and join dataframes:

missing_id <- df1 %>% filter(is.na(Count)) %>% pull(ID)  #Count = one of the columns with missing data

df1 <- df1 %>% filter(!is.na(Count)) %>%
      full_join(df2 %>% filter(ID %in% missing_id)) %>%
      arrange(ID)

Upvotes: 0

MKR
MKR

Reputation: 20095

Use of dplyr::coalesce will make task simpler.

library(dplyr)

df1 %>% inner_join(df2, by= "ID") %>%
  mutate(Count = coalesce(Count.x, Count.y)) %>%
  select(ID, Count)

#   ID Count
# 1 11   345
# 2 22   456
# 3 33   786
# 4 44   765
# 5 55   890
# 6 66   888
# 7 77   654

Data:

df1 <- read.table(text = 
"ID Count 
11  345
22  456
33  786
44  765
55  NA
66  888
77  NA",
header = TRUE)

df2 <- read.table(text = 
"ID  Count 
11  536
22  654
33  786
44  999
55  890
66  111
77  654",
header = TRUE)

Upvotes: 5

Frank
Frank

Reputation: 66819

You can use an update join to edit those rows in the first table:

library(data.table)
setDT(DF1); setDT(DF2)

DF1[is.na(Count), Count := DF2[.SD, on=.(ID), x.Count]]

How it works

  • DF[i, j] filters by i then does j
  • So in j, .SD refers to the subset of data after filtering with i
  • A j with := adds or modifies columns in the table
  • x[i, on=, x.v] is as described in this similar answer

Upvotes: 2

Jan
Jan

Reputation: 43169

Using dplyr:

library(dplyr)

df1 %>%
  left_join(df2, by = c("ID")) %>%
  mutate(Count = ifelse(is.na(Count.x), Count.y, Count.x)) %>%
  select(-c(Count.x, Count.y))

This yields

  ID Count
1 11   345
2 22   456
3 33   786
4 44   765
5 55   890
6 66   888
7 77   654

Upvotes: 0

Related Questions