Nick Knauer
Nick Knauer

Reputation: 4243

Replace NA with value from another column with corresponding column name

I have a dataset that looks like below:

Col1    Col2    Col3     Col4   Last_Col1    Last_Col2   Last_Col3    Last_Col4
  NA       1       4        7           9           10          11           12
  NA      NA       4       NA          NA            9          NA           10
   8      NA       9       10          11           12          20           49
   9       7      NA       NA          34            2           3           50

How do I replace the NA values without Last_ in the beginning of the name with with the corresponding columns with Last_ in the name? I then want to remove the columns that have Last_ in the name.

Final expected output:

Col1    Col2    Col3     Col4   
   9       1       4        7    
  NA       9       4       10    
   8      12       9       10    
   9       7       3       50

Any help would be great, thanks!

Upvotes: 2

Views: 55

Answers (3)

s_baldur
s_baldur

Reputation: 33498

A data.table solution (will be very fast on bigger datasets):

ourcols <- paste0("Col", 1:4)
for (col in ourcols) {
  rows = which(is.na(dt[[col]]))
  set(x = dt, i = rows, j = col, value = dt[rows, get(paste0("Last_", col))])
}
dt[, ..ourcols]
   Col1 Col2 Col3 Col4
1:    9    1    4    7
2:   NA    9    4   10
3:    8   12    9   10
4:    9    7    3   50

Upvotes: 1

akrun
akrun

Reputation: 886938

As it is the corresponding columns, we subset the columns that have 'NA' to be replaced ('nm1') and the columns having 'Last' as prefix ('nm2'), create a logical matrix on the first set of columns ('i1'), use that to assign the values of the second set of values that corresponds to NA

nm1 <- names(df1)[1:4]
nm2 <- names(df1)[5:8]

Or use

nm1 <- names(df1)[startsWith(names(df1), "Col")]
nm2 <- names(df1)[startsWith(names(df1), "Last_")]
i1 <- is.na(df1[nm1])
df1[nm1][i1] <- df1[nm2][i1] 
newdf <- df1[nm1]
newdf
#  Col1 Col2 Col3 Col4
#1    9    1    4    7
#2   NA    9    4   10
#3    8   12    9   10
#4    9    7    3   50

data

df1 <- structure(list(Col1 = c(NA, NA, 8L, 9L), Col2 = c(1L, NA, NA, 
7L), Col3 = c(4L, 4L, 9L, NA), Col4 = c(7L, NA, 10L, NA), Last_Col1 = c(9L, 
 NA, 11L, 34L), Last_Col2 = c(10L, 9L, 12L, 2L), Last_Col3 = c(11L, 
 NA, 20L, 3L), Last_Col4 = c(12L, 10L, 49L, 50L)), 
 class = "data.frame", row.names = c(NA, -4L))

Upvotes: 4

Roman Luštrik
Roman Luštrik

Reputation: 70623

Split the data into two data.frames, find which are missing in those designated Col_X and overwrite them with values from the data.frame that starts with Last_.

xy <- read.table(text = "Col1    Col2    Col3     Col4   Last_Col1    Last_Col2   Last_Col3    Last_Col4
  NA       1       4        7           9           10          11           12
  NA      NA       4       NA          NA            9          NA           10
   8      NA       9       10          11           12          20           49
   9       7      NA       NA          34            2           3           50", header = TRUE)

xy1 <- xy[, grepl("^Col\\d+$", names(xy))]
xy2 <- xy[, grepl("^Last_Col\\d+$", names(xy))]

xy1[is.na(xy1)] <- xy2[is.na(xy1)]

> xy1
  Col1 Col2 Col3 Col4
1    9    1    4    7
2   NA    9    4   10
3    8   12    9   10
4    9    7    3   50

Upvotes: 3

Related Questions