Reputation: 21

Replacing values of dataframe with NA values of another dataframe

I would like to replace values of one dataframe with NA of another dataframe that have the same identifier. That is, for all values of df1 that have the same id, assign the "NA" values of df2 at the corresponding id and indices.

I have df1 and df2:

df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))

what i would like is df3:

df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))

I have tried the lookup function and the library "data.table", but i could get the correct df3. Could anyone please help me with this?

Upvotes: 2

Answers (3)

Alan Gómez

Reputation: 378

First you need to transform your "NA"'s with real NA's:

df2[df2=="NA"]<-NA

Then, can perform the replacements with:

df1[is.na(df2[match(df1[,1],df2[,1]),])] <- NA

OUTPUT:

df1
  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

Upvotes: 0

Chris

Reputation: 472

Since your NA values are actually text "NA" you will have to turn all your variables into text (with as.character). You can join both dataframes by id column. Since both dataframes have columns a,b, and c R will rename then a.x, b.x and c.x (df1) and a.y, b.y and c.y (df2). After that you can create new columns a,b, and c. These than have "NA" whenever a.y == "NA" and a.x otherwise (and so on). If your NA values were real NA you need to test differently is.na(value) (see example below in the code).

library(dplyr)

df1 %>%  
  mutate_all(as.character) %>% # allvariables as text
  left_join(df2 %>% 
              mutate_all(as.character) ## all variables as text
            , by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
  mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else  a.x 
         b = case_when(b.y == "NA" ~ "NA", TRUE  ~ b.x),
         c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
  select(id, a, b, c) ## keep only these initial columns

  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

##if your dataframe head real NA this is how you can test:
missing_value <- NA

is.na(missing_value) ## TRUE
missing_value == NA  ## Does not work with R

Upvotes: 1

akrun

Reputation: 887951

We can do a join on 'id' and then replace the NA values by multiplying the .

library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2,  (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD, 
                  mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
#   id  a  b  c
#1:  1 NA NA  9
#2:  1 NA NA  3
#3:  2  1  0 NA
#4:  2  7  3 NA
#5:  6 NA  2 NA
#6:  6 NA  5 NA

data

df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))

NOTE: In the OP's post NA were "NA"

Upvotes: 3

Replacing values of dataframe with NA values of another dataframe

Answers (3)

data

Related Questions