Rajarshi Bhadra
Rajarshi Bhadra

Reputation: 1944

Replace na in column by value corresponding to column name in seperate table

I have a data frame which looks like this

data <- data.frame(ID = c(1,2,3,4,5),A = c(1,4,NA,NA,4),B = c(1,2,NA,NA,NA),C= c(1,2,3,4,NA))

> data
  ID  A  B  C
1  1  1  1  1
2  2  4  2  2
3  3 NA NA  3
4  4 NA NA  4
5  5  4 NA NA

I have a mapping file as well which looks like this

reference <- data.frame(Names = c("A","B","C"),Vals = c(2,5,6))

> reference
  Names Vals
1     A    2
2     B    5
3     C    6

I want my data file to be modified using the reference file in a way which would yield me this final data frame

> final_data
  ID A B C
1  1 1 1 1
2  2 4 2 2
3  3 2 5 3
4  4 2 5 4
5  5 4 5 6

What is the fastest way I can acheive this in R?

Upvotes: 0

Views: 435

Answers (3)

bgoldst
bgoldst

Reputation: 35314

One approach is to compute a logical matrix of the target columns capturing which cells are NA. We can then index-assign the NA cells with the replacement values. The tricky part is ensuring the replacement vector aligns with the indexed cells:

im <- is.na(data[as.character(reference$Names)]);
data[as.character(reference$Names)][im] <- rep(reference$Vals,colSums(im));
data;
##   ID A B C
## 1  1 1 1 1
## 2  2 4 2 2
## 3  3 2 5 3
## 4  4 2 5 4
## 5  5 4 5 6

Upvotes: 3

akrun
akrun

Reputation: 886938

We can do this with Map

data[as.character(reference$Names)] <- Map(function(x,y) replace(x, 
         is.na(x), y), data[as.character(reference$Names)], reference$Vals)
data
#  ID A B C
#1  1 1 1 1
#2  2 4 2 2
#3  3 2 5 3
#4  4 2 5 4
#5  5 4 5 6

EDIT: Based on @thelatemail's comments.

NOTE: NO external packages used


As we are looking for efficient solution, another approach would be set from data.table

library(data.table)
setDT(data)
v1 <- as.character(reference$Names)

for(j in seq_along(v1)){
  set(data, i = which(is.na(data[[v1[j]]])), j= v1[j], value = reference$Vals[j]  )
 }

NOTE: Only a single efficient external package used.

Upvotes: 4

alistaire
alistaire

Reputation: 43334

If reference was the same wide format as data, dplyr's new (v. 0.5.0) coalesce function is built for replacing NAs; together with purrr, which offers alternate notations for *apply functions, it makes the process very simple:

library(dplyr)

# spread reference to wide, add ID column for mapping
reference_wide <- data.frame(ID = NA_real_, tidyr::spread(reference, Names, Vals))

reference_wide

#   ID A B C
# 1 NA 2 5 6

# now coalesce the two column-wise and return a df
purrr::map2_df(data, reference_wide, coalesce)

# Source: local data frame [5 x 4]
# 
#      ID     A     B     C
#   <dbl> <dbl> <dbl> <dbl>
# 1     1     1     1     1
# 2     2     4     2     2
# 3     3     2     5     3
# 4     4     2     5     4
# 5     5     4     5     6

Upvotes: 0

Related Questions