Reputation: 4243
I have a dataset that looks like below:
Col1 Col2 Col3 Col4 Last_Col1 Last_Col2 Last_Col3 Last_Col4
NA 1 4 7 9 10 11 12
NA NA 4 NA NA 9 NA 10
8 NA 9 10 11 12 20 49
9 7 NA NA 34 2 3 50
How do I replace the NA values without Last_
in the beginning of the name with with the corresponding columns with Last_
in the name? I then want to remove the columns that have Last_
in the name.
Final expected output:
Col1 Col2 Col3 Col4
9 1 4 7
NA 9 4 10
8 12 9 10
9 7 3 50
Any help would be great, thanks!
Upvotes: 2
Views: 55
Reputation: 33498
A data.table
solution (will be very fast on bigger datasets):
ourcols <- paste0("Col", 1:4)
for (col in ourcols) {
rows = which(is.na(dt[[col]]))
set(x = dt, i = rows, j = col, value = dt[rows, get(paste0("Last_", col))])
}
dt[, ..ourcols]
Col1 Col2 Col3 Col4
1: 9 1 4 7
2: NA 9 4 10
3: 8 12 9 10
4: 9 7 3 50
Upvotes: 1
Reputation: 886938
As it is the corresponding columns, we subset the columns that have 'NA' to be replaced ('nm1') and the columns having 'Last' as prefix ('nm2'), create a logical matrix on the first set of columns ('i1'), use that to assign the values of the second set of values that corresponds to NA
nm1 <- names(df1)[1:4]
nm2 <- names(df1)[5:8]
Or use
nm1 <- names(df1)[startsWith(names(df1), "Col")]
nm2 <- names(df1)[startsWith(names(df1), "Last_")]
i1 <- is.na(df1[nm1])
df1[nm1][i1] <- df1[nm2][i1]
newdf <- df1[nm1]
newdf
# Col1 Col2 Col3 Col4
#1 9 1 4 7
#2 NA 9 4 10
#3 8 12 9 10
#4 9 7 3 50
df1 <- structure(list(Col1 = c(NA, NA, 8L, 9L), Col2 = c(1L, NA, NA,
7L), Col3 = c(4L, 4L, 9L, NA), Col4 = c(7L, NA, 10L, NA), Last_Col1 = c(9L,
NA, 11L, 34L), Last_Col2 = c(10L, 9L, 12L, 2L), Last_Col3 = c(11L,
NA, 20L, 3L), Last_Col4 = c(12L, 10L, 49L, 50L)),
class = "data.frame", row.names = c(NA, -4L))
Upvotes: 4
Reputation: 70623
Split the data into two data.frames, find which are missing in those designated Col_X
and overwrite them with values from the data.frame that starts with Last_
.
xy <- read.table(text = "Col1 Col2 Col3 Col4 Last_Col1 Last_Col2 Last_Col3 Last_Col4
NA 1 4 7 9 10 11 12
NA NA 4 NA NA 9 NA 10
8 NA 9 10 11 12 20 49
9 7 NA NA 34 2 3 50", header = TRUE)
xy1 <- xy[, grepl("^Col\\d+$", names(xy))]
xy2 <- xy[, grepl("^Last_Col\\d+$", names(xy))]
xy1[is.na(xy1)] <- xy2[is.na(xy1)]
> xy1
Col1 Col2 Col3 Col4
1 9 1 4 7
2 NA 9 4 10
3 8 12 9 10
4 9 7 3 50
Upvotes: 3