Reputation: 21
I would like to replace values of one dataframe with NA of another dataframe that have the same identifier. That is, for all values of df1 that have the same id, assign the "NA" values of df2 at the corresponding id and indices.
I have df1 and df2:
df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))
what i would like is df3:
df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))
I have tried the lookup function and the library "data.table", but i could get the correct df3. Could anyone please help me with this?
Upvotes: 2
Views: 171
Reputation: 378
First you need to transform your "NA"'s
with real NA's
:
df2[df2=="NA"]<-NA
Then, can perform the replacements with:
df1[is.na(df2[match(df1[,1],df2[,1]),])] <- NA
OUTPUT:
df1
id a b c
1 1 NA NA 9
2 1 NA NA 3
3 2 1 0 NA
4 2 7 3 NA
5 6 NA 2 NA
6 6 NA 5 NA
Upvotes: 0
Reputation: 472
Since your NA values are actually text "NA" you will have to turn all your variables into text (with as.character). You can join both dataframes by id column. Since both dataframes have columns a,b, and c R will rename then a.x, b.x and c.x (df1) and a.y, b.y and c.y (df2). After that you can create new columns a,b, and c. These than have "NA" whenever a.y == "NA" and a.x otherwise (and so on). If your NA values were real NA you need to test differently is.na(value) (see example below in the code).
library(dplyr)
df1 %>%
mutate_all(as.character) %>% # allvariables as text
left_join(df2 %>%
mutate_all(as.character) ## all variables as text
, by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else a.x
b = case_when(b.y == "NA" ~ "NA", TRUE ~ b.x),
c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
select(id, a, b, c) ## keep only these initial columns
id a b c
1 1 NA NA 9
2 1 NA NA 3
3 2 1 0 NA
4 2 7 3 NA
5 6 NA 2 NA
6 6 NA 5 NA
##if your dataframe head real NA this is how you can test:
missing_value <- NA
is.na(missing_value) ## TRUE
missing_value == NA ## Does not work with R
Upvotes: 1
Reputation: 887951
We can do a join on
'id' and then replace the NA values by multiplying the .
library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2, (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD,
mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
# id a b c
#1: 1 NA NA 9
#2: 1 NA NA 3
#3: 2 1 0 NA
#4: 2 7 3 NA
#5: 6 NA 2 NA
#6: 6 NA 5 NA
df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))
NOTE: In the OP's post NA
were "NA"
Upvotes: 3