Bamqf
Bamqf

Reputation: 3542

R: Impute missing data with mean of first previous and latter non missing data

Assume the data look like:

df <- data.frame(ID=1:6, Value=c(NA, 1, NA, NA, 2, NA))
df
  ID Value
1  1    NA
2  2     1
3  3    NA
4  4    NA
5  5     2
6  6    NA

And I want the imputed result be like:

  ID Value
1  1   1.0
2  2   1.0
3  3   1.5
4  4   1.5
5  5   2.0
6  6   2.0

More specific, I want to impute missing data with mean of first previous and latter non missing data, if only one of previous or latter non missing data exist, impute with this non missing data. Behavior for all data are missing is not defined.

How can I do that in R?

Upvotes: 0

Views: 368

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269526

Use na.locf both forwards and backwards and take their average:

library(zoo)

both <- cbind( na.locf(df$Value, na.rm = FALSE), 
               na.locf(df$Value, na.rm = FALSE, fromLast = TRUE))
transform(df, Value = rowMeans(both, na.rm = TRUE))

giving:

  ID Value
1  1   1.0
2  2   1.0
3  3   1.5
4  4   1.5
5  5   2.0
6  6   2.0

Upvotes: 1

IRTFM
IRTFM

Reputation: 263332

Take a look at the design of approxfun with rule=2. This isn't exactly what you asked for (since it does a linear interpolation across the NA gaps rather than substituting the mean of the gap endpoints), but it might be acceptable:

> approxfun(df$ID, df$Value, rule=2)(df$ID)
[1] 1.000000 1.000000 1.333333 1.666667 2.000000 2.000000

With rule=2 it does behave as you desired at the extremes. There are also na.approx methods in the zoo-package.

I would caution against using such data for any further statistical inference. This method of imputation is essentially saying there is no possibility of random variation during periods of no measurement, and the world is generally not so consistent.

Upvotes: 1

Buzz Lightyear
Buzz Lightyear

Reputation: 844

This should work.

for( i in 1:nrow(df)){
    if(is.na(df$Value[i])){
        df$Value[i] <- mean(df$Value[1:i])
    }
}

I don't know if this is exactly what you want. I didn't understand your statement. "I want to impute missing data with mean of first previous and latter non missing data, if only one of previous or latter non missing data exist, impute with this non missing data"

What values do you want to find the mean of to replace the NAs?

Upvotes: 0

Related Questions