NoraNorad
NoraNorad

Reputation: 27

mean-before-after imputation in R

I'm new in R. My question is how to impute missing value using mean of before and after of the missing data point?

example;

using the mean from the upper and lower of each NA as the impute value.

-mean for row number 3 is 38.5

-mean for row number 7 is 32.5

age
52.0
27.0
NA
23.0
39.0
32.0
NA
33.0
43.0

Thank you.

Upvotes: 3

Views: 1728

Answers (4)

Steffen Moritz
Steffen Moritz

Reputation: 7730

You are looking for Moving Average Imputation - you can use the na_ma function of imputeTS for this.

library(imputeTS)
x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
na_ma(x, k=1, weighting = "simple")

[1] 52.00000 27.00000 25.00000 23.00000 39.00000 31.66667 38.33333 33.00000 43.00000

This produces exactly the required result. With the k parameter you specify how many neighbors on each side are taken into account for the calculation.

Upvotes: 2

agstudy
agstudy

Reputation: 121568

Here a solution using from na.locf from zoo package which replaces each NA with the most recent non-NA prior or posterior to it.

0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
[1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0

the advantage here if you have more than one consecutive NA.

x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
[1] 52 27 25 23 39 36 36 33 43

EDIT rev argument is deprecated so I replace it by fromlast

Upvotes: 5

johannes
johannes

Reputation: 14433

Just an other way:

age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
age[is.na(age)] <- apply(sapply(which(is.na(age)), "+", c(-1, 1)), 2, 
                         function(x) mean(age[x]))
age
## [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

This would be a basic manual approach you can take:

age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
age[is.na(age)] <- rowMeans(cbind(age[which(is.na(age))-1], 
                                  age[which(is.na(age))+1]))
age
# [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0

Or, since you seem to have a single column data.frame:

mydf <- data.frame(age = c(52, 27, NA, 23, 39, 32, NA, 33, 43))

mydf[is.na(mydf$age), ] <- rowMeans(
  cbind(mydf$age[which(is.na(mydf$age))-1],
        mydf$age[which(is.na(mydf$age))+1]))

Upvotes: 1

Related Questions