Srm Murty
Srm Murty

Reputation: 135

How to place not available values with previous nearest with respect to corresponding value

Hi i am having data frame ,how to replace NA values in "Val_1" with respect to nearest value of Val_2
for e.g Val_1 at ID -4 value is missing and corresponding value of Val_2 is "33.3" we need to replace with nearest value in Val_2 i.e 45 (previous nearest value is 45) also ID-8 with 33 (nearest value of 44.6 is 44.5)

ID  Date      Val_1 Val_2
1   01-02-2014  NA  22
2   02-02-2014  23  NA
3   03-02-2014  45  33
4   04-02-2014  NA  33.3
5   05-02-2014  45  46
6   06-02-2014  33  44.5
7   07-02-2014  56  48
8   08-02-2014  NA  44.6
9   09-02-2014  10  43
10  10-02-2014  14  56
11  11-02-2014  NA  NA
12  12-02-2014  22  22

we can replace NA value by

library(zoo)
na.locf(na.locf(DF$Val_1), fromLast = TRUE)
but above code replace with previous value from the same column
o/p :
ID  Date    Val_1   Val_2
1   01-02-2014  NA  22
2   02-02-2014  23  NA
3   03-02-2014  45  33
4   04-02-2014  45  33.3
5   05-02-2014  45  46
6   06-02-2014  33  44.5
7   07-02-2014  56  48
8   08-02-2014  33  44.6
9   09-02-2014  10  43
10  10-02-2014  14  56
11  11-02-2014  NA  NA
12  12-02-2014  22  22

Thanks

Upvotes: 0

Views: 46

Answers (1)

Gabriel Mota
Gabriel Mota

Reputation: 312

Sorry but I couldn't think of any simpler way:

# To use pipes
library(dplyr)

# Give a threshold. Nearest values must have a difference below this threshold
diff.threshold <- 0.5

# Create a vector with IDs that must have Val_1 updated 
IDtoReplace <- DF %>% filter(is.na(Val_1), !is.na(Val_2)) %>%
  select(ID) %>%
  unlist()

for (id in IDtoReplace){
  # Get Val_2 from current id
  curVal2 <- DF %>% filter(ID==id) %>% select(Val_2) %>% unlist()

  # Get value to be input
  valuetoinput <- DF %>% filter(!is.na(Val_1),!is.na(Val_2),ID < id) %>%    # Filter out all NA values and keep only previous ID
                    mutate(diff = abs(Val_2-curVal2)) %>%                   # Calculate all the differentes
                    filter(diff==min(diff),diff<=diff.threshold) %>%        # Keep row with minimum difference (it has to be below the threshold)
                    select(Val_1) %>%                                       # Select Val_1
                    unlist()

  # If any value is found, replace it in the data frame
  if(length(valuetoinput)>0)
    DF[which(DF$ID==id),"Val_1"] <- valuetoinput
}

And as result:

> DF
   ID       Date Val_1 Val_2
1   1 01-02-2014    NA  22.0
2   2 02-02-2014    23    NA
3   3 03-02-2014    45  33.0
4   4 04-02-2014    45  33.3
5   5 05-02-2014    45  46.0
6   6 06-02-2014    33  44.5
7   7 07-02-2014    56  48.0
8   8 08-02-2014    33  44.6
9   9 09-02-2014    10  43.0
10 10 10-02-2014    14  56.0
11 11 11-02-2014    NA    NA
12 12 12-02-2014    22  22.0

Will you use something similar very often? If yes, I suggest you to rewrite the for loop as a function.

Upvotes: 1

Related Questions