ArOk
ArOk

Reputation: 193

Replace NAs by Interpolation in groups

I want to know how to replace NA values in a specific column by groups with an interpolation. Some of my groups only have one non-NA I would like to fill those groups with the one non-NA

If I have a dataframe like:

Group Value
ALB     NA
ALB     10
ALB     NA
ALB     12
ARE     NA
ARE     NA
ARE     2
ARE     NA
ARE     NA
ARG     4
ARG     NA
ARG     6

I want to create a new column, so my dataframe would look like the following:

Group Value New Column
ALB     NA    9
ALB     10    10
ALB     NA    11
ALB     12    12
ARE     NA    2
ARE     NA    2
ARE     2     2
ARE     NA    2
ARE     NA    2
ARG     4     4
ARG     NA    5
ARG     6     6

Upvotes: 1

Views: 1458

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269654

This one-liner will interpolate the NAs by group and for NAs on the ends of a group will extend the nearest non-NA to it giving it the same value, i.e. it does linear interpolation and constant extrapolation, which is not exactly what was asked for but may be close enough. Note that this also implies that if there is only one non-NA then all NAs are set to it.

library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))

giving:

   Group Value newCol
1    ALB    NA     10
2    ALB    10     10
3    ALB    NA     11
4    ALB    12     12
5    ARE    NA      2
6    ARE    NA      2
7    ARE     2      2
8    ARE    NA      2
9    ARE    NA      2
10   ARG     4      4
11   ARG    NA      5
12   ARG     6      6

Note

DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA, 
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA, 
-8L))

DF <- 
  structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
  2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"), 
  Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
  )), class = "data.frame", row.names = c(NA, -12L))

Upvotes: 3

MKR
MKR

Reputation: 20095

One can consider using Hmisc::approxExtrap function to perform both interpolate and extrapolate missing values. You have to provide both x and y values without NA which will be used as reference to interpolate/extrapolate missing values. The function returns for required set/rows (as passed with argument xout) after interpolating/extrapolating.

library(Hmisc)
library(dplyr)

df %>% group_by(Group) %>%
mutate(newVal = 
approxExtrap(which(!is.na(Value)), Value[!is.na(Value)],xout = 1:n(), rule=2)$y) %>%
  as.data.frame()

#   Group Value newVal
# 1     1    NA      9
# 2     1    10     10
# 3     1    NA     11
# 4     1    12     12
# 5     2     4      4
# 6     2    NA      5
# 7     2    NA      6
# 8     2     7      7

Data:

df <- read.table(text =
"Group Value
1     NA
1     10
1     NA
1     12
2     4
2     NA
2     NA
2     7",
header = TRUE)

Upvotes: 0

St&#233;phane Laurent
St&#233;phane Laurent

Reputation: 84529

df <- data.frame(
  group = rep(1:2, each = 4), 
  value = c(NA, 10, NA, 12, 4, NA, NA, 7))

complete <- function(x){
  i <- which.min(is.na(x))
  y <- seq_along(x) + x[i] - i
  return(y)
}

newdf <- do.call(rbind, 
                 lapply(split(df, df$group), 
                        function(dat){
                          transform(dat, newvalue=complete(value))
                        }))
rownames(newdf) <- NULL

This gives:

> newdf
  group value newvalue
1     1    NA        9
2     1    10       10
3     1    NA       11
4     1    12       12
5     2     4        4
6     2    NA        5
7     2    NA        6
8     2     7        7

Upvotes: 1

see-king_of_knowledge
see-king_of_knowledge

Reputation: 523

Check out na.approx in zoo package:

https://www.rdocumentation.org/packages/zoo/versions/1.8-2/topics/na.approx

You can use split on your data and then use apply.

 > library(zoo)
 > df<- data.frame(
      group = rep(1:2, each = 4), 
      value = c(NA, 10, NA, 12, 4, NA, NA, 7))
 > df
     group value
 1     1    NA
 2     1    10
 3     1    NA
 4     1    12
 5     2     4
 6     2    NA
 7     2    NA
 8     2     7
 > dfl <- split(df$value, df$group)
 > dfl
   $`1`
   [1] NA 10 NA 12

   $`2`
   [1]  4 NA NA  7

 > lapply(dfl, na.approx)
   $`1`
   [1] 10 11 12

   $`2`
   [1] 4 5 6 7

However, this will work if you have upper and lower limit in each group. Otherwise you will have issue like in the first group where it couldn't determine what to replace NA with.

Upvotes: 0

Related Questions