Reputation: 21
I would like to exchange all NA values in the columns for the respective medians
id <- c(1,2,3,4,5,6,7,8,9,10)
varA <- c(15,10,8,19,7,5,NA,11,12,NA)
varB <- c(NA,1,2,3,4,3,3,2,1,NA)
df <- data.frame(id, varA,varB)
median(df$varA, na.rm=TRUE)
median(df$varB, na.rm=TRUE)
df1 <- df
# Columns to be modified with Median in place of the NA
col <- c("varA", "varB")
df1[col] <- sapply(df1[col],
function(x) replace(x, x %in% is.na(df1), median[col]))
df1
Error in [.default
(df1, col) : invalid subscript type 'closure'
Upvotes: 2
Views: 39
Reputation: 8110
Another option, which is similar to your original attempt.
df1[col] <- apply(df1[col], 2, \(x) ifelse(is.na(x), median(x, na.rm = TRUE), x) )
df1
#> id varA varB
#> 1 1 15.0 2.5
#> 2 2 10.0 1.0
#> 3 3 8.0 2.0
#> 4 4 19.0 3.0
#> 5 5 7.0 4.0
#> 6 6 5.0 3.0
#> 7 7 10.5 3.0
#> 8 8 11.0 2.0
#> 9 9 12.0 1.0
#> 10 10 10.5 2.5
Upvotes: 0
Reputation: 700
dplyr
+ tidyr
solution
library(dplyr)
library(tidyr)
df %>%
mutate(varA = replace_na(varA, median(varA, na.rm = TRUE)),
varB = replace_na(varB, median(varB, na.rm = TRUE)))
Upvotes: 1
Reputation: 887158
We may use
library(zoo)
df[col] <- na.aggregate(df[col], FUN = median)
Upvotes: 2