Reputation:
Context: I'm trying to fill NA values for a column in my data called "Cholesterol" with a vector of sampled values, however, I couldn't find anything that could help with that. I've tried using replace_na
, but it it not replacing the NA values
MRE:
69 181 308 166 211 257 182 NA NA NA NA NA NA NA
[301] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[331] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[361] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[391] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 260 209 218 228
[421] 213 NA 236 NA NA 267 166 NA NA NA NA 220 177 236 NA NA NA NA NA NA NA NA NA 186 100 228 NA 171 230 NA
[451] NA NA 281 NA 203 NA NA NA NA NA 277 NA 233 NA NA 240 NA NA 153 224 NA NA NA 316 NA NA 218 NA 311 NA
[481] NA NA 270 NA NA 217 214 214 252 220 214 203 NA 339 216 276 458 241 384 297 248 308 208 227
missing_values = sum(is.na(df$Cholesterol))
missing_values
# Set seed
set.seed(42)
fill_NA_values_cholesterol = sample(rnorm(n = missing_values, mean = mean(cholesterol_sem_valores_nulos, trim = 0.2), sd = mad(cholesterol_sem_valores_nulos)), size = missing_values)
The variable cholesterol_sem_valores_nulos
is simply a different vector that only has filled valued (NAs are not present in this vector)
How could I make the code fill the NA values using the vector fill_NA_values_cholesterol
? The amount of NA values present in df$Cholesterol
is 172 (same length of the fill_NA_values_cholesterol
)
Thank you in advance
Upvotes: 0
Views: 72
Reputation: 3212
Here is an example, where I use purrr
together with the rnorm()
function you specified to replace the NA-values.
library(dplyr)
# Some example data
df <- tibble(
Cholesterol = c(NA, 1:3, NA)
)
# I make this as a function to save some space underneath, but it is not
# necessary
draw_random_based_on <- function(x) {
rnorm(
n = 1,
mean = mean(x, trim = 0.2, na.rm = TRUE),
sd = mad(x, na.rm = TRUE)
)
}
# Under I add a new column - Cholesterol2 - where non missing values are the
# same as Cholesterol, but missing values are replaced by the random function
# you specified
df %>%
mutate(
Cholesterol2 = purrr::map_dbl(
Cholesterol,
~ifelse(
is.na(.x),
draw_random_based_on(df$Cholesterol),
.x
)
)
)
Upvotes: 0