Reputation: 135
Hi i am having data frame ,how to replace NA values in "Val_1" with respect to nearest value of Val_2
for e.g Val_1 at ID -4 value is missing and corresponding value of Val_2 is "33.3" we need to replace with nearest value in Val_2 i.e 45 (previous nearest value is 45) also ID-8 with 33 (nearest value of 44.6 is 44.5)
ID Date Val_1 Val_2
1 01-02-2014 NA 22
2 02-02-2014 23 NA
3 03-02-2014 45 33
4 04-02-2014 NA 33.3
5 05-02-2014 45 46
6 06-02-2014 33 44.5
7 07-02-2014 56 48
8 08-02-2014 NA 44.6
9 09-02-2014 10 43
10 10-02-2014 14 56
11 11-02-2014 NA NA
12 12-02-2014 22 22
we can replace NA value by
library(zoo)
na.locf(na.locf(DF$Val_1), fromLast = TRUE)
but above code replace with previous value from the same column
o/p :
ID Date Val_1 Val_2
1 01-02-2014 NA 22
2 02-02-2014 23 NA
3 03-02-2014 45 33
4 04-02-2014 45 33.3
5 05-02-2014 45 46
6 06-02-2014 33 44.5
7 07-02-2014 56 48
8 08-02-2014 33 44.6
9 09-02-2014 10 43
10 10-02-2014 14 56
11 11-02-2014 NA NA
12 12-02-2014 22 22
Thanks
Upvotes: 0
Views: 46
Reputation: 312
Sorry but I couldn't think of any simpler way:
# To use pipes
library(dplyr)
# Give a threshold. Nearest values must have a difference below this threshold
diff.threshold <- 0.5
# Create a vector with IDs that must have Val_1 updated
IDtoReplace <- DF %>% filter(is.na(Val_1), !is.na(Val_2)) %>%
select(ID) %>%
unlist()
for (id in IDtoReplace){
# Get Val_2 from current id
curVal2 <- DF %>% filter(ID==id) %>% select(Val_2) %>% unlist()
# Get value to be input
valuetoinput <- DF %>% filter(!is.na(Val_1),!is.na(Val_2),ID < id) %>% # Filter out all NA values and keep only previous ID
mutate(diff = abs(Val_2-curVal2)) %>% # Calculate all the differentes
filter(diff==min(diff),diff<=diff.threshold) %>% # Keep row with minimum difference (it has to be below the threshold)
select(Val_1) %>% # Select Val_1
unlist()
# If any value is found, replace it in the data frame
if(length(valuetoinput)>0)
DF[which(DF$ID==id),"Val_1"] <- valuetoinput
}
And as result:
> DF
ID Date Val_1 Val_2
1 1 01-02-2014 NA 22.0
2 2 02-02-2014 23 NA
3 3 03-02-2014 45 33.0
4 4 04-02-2014 45 33.3
5 5 05-02-2014 45 46.0
6 6 06-02-2014 33 44.5
7 7 07-02-2014 56 48.0
8 8 08-02-2014 33 44.6
9 9 09-02-2014 10 43.0
10 10 10-02-2014 14 56.0
11 11 11-02-2014 NA NA
12 12 12-02-2014 22 22.0
Will you use something similar very often? If yes, I suggest you to rewrite the for
loop as a function.
Upvotes: 1