user15300490
user15300490

Reputation:

Is there a way to replace missing values based on percentage?

For example, if I have a variable that takes 2 values Left and Right. And the counts are as follows:

Left Right 
973   897 

And say I have 500 missing values. The % of missing values replaced with Left will be 973/(973+897) and the % of missing values replaced with Right will be 897/(973+897).

How to do this? Or is this a bad idea?

Upvotes: 1

Views: 265

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388807

If your dataset is something like this -

vec <- sample(rep(c('Left', 'Right', NA), c(10, 15, 10)))

You can perform the proportion calculation and replace NA values by -

prop <- prop.table(table(vec)) * sum(is.na(vec))
vec[is.na(vec)] <- sample(rep(names(prop), prop))
vec

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

If interpolating can help, this is how, in principle, it works:

Test data:

set.seed(123)
df <- data.frame(
   left = c(rnorm(5), NA, NA, rnorm(5), NA, rnorm(5))
)

Solution: To replace the NAvalues with linearly interpolated values use zoo's function na.approx:

library(zoo)
library(dplyr)
df %>%
   mutate(left_intpl = na.approx(left))
          left  left_intpl
1  -0.56047565 -0.56047565
2  -0.23017749 -0.23017749
3   1.55870831  1.55870831
4   0.07050839  0.07050839
5   0.12928774  0.12928774
6           NA  0.65788015
7           NA  1.18647257
8   1.71506499  1.71506499
9   0.46091621  0.46091621
10 -1.26506123 -1.26506123
11 -0.68685285 -0.68685285
12 -0.44566197 -0.44566197
13          NA  0.38920991
14  1.22408180  1.22408180
15  0.35981383  0.35981383
16  0.40077145  0.40077145
17  0.11068272  0.11068272
18 -0.55584113 -0.55584113

Upvotes: 0

Related Questions