Reputation:
For example, if I have a variable that takes 2 values Left
and Right
. And the counts are as follows:
Left Right
973 897
And say I have 500 missing values. The % of missing values replaced with Left
will be 973/(973+897)
and the % of missing values replaced with Right
will be 897/(973+897)
.
How to do this? Or is this a bad idea?
Upvotes: 1
Views: 265
Reputation: 388807
If your dataset is something like this -
vec <- sample(rep(c('Left', 'Right', NA), c(10, 15, 10)))
You can perform the proportion calculation and replace NA
values by -
prop <- prop.table(table(vec)) * sum(is.na(vec))
vec[is.na(vec)] <- sample(rep(names(prop), prop))
vec
Upvotes: 1
Reputation: 21400
If interpolating can help, this is how, in principle, it works:
Test data:
set.seed(123)
df <- data.frame(
left = c(rnorm(5), NA, NA, rnorm(5), NA, rnorm(5))
)
Solution: To replace the NA
values with linearly interpolated values use zoo
's function na.approx
:
library(zoo)
library(dplyr)
df %>%
mutate(left_intpl = na.approx(left))
left left_intpl
1 -0.56047565 -0.56047565
2 -0.23017749 -0.23017749
3 1.55870831 1.55870831
4 0.07050839 0.07050839
5 0.12928774 0.12928774
6 NA 0.65788015
7 NA 1.18647257
8 1.71506499 1.71506499
9 0.46091621 0.46091621
10 -1.26506123 -1.26506123
11 -0.68685285 -0.68685285
12 -0.44566197 -0.44566197
13 NA 0.38920991
14 1.22408180 1.22408180
15 0.35981383 0.35981383
16 0.40077145 0.40077145
17 0.11068272 0.11068272
18 -0.55584113 -0.55584113
Upvotes: 0