Neerav Makwana
Neerav Makwana

Reputation: 17

Fill missing values in dataframe columns with column median in R

I have a dataframe with some columns of type "factor" and others "numeric". There are no missing values in any of the "factor" columns.

I am trying to replace missing values in each column with column median using the following code:

for(i in 1:ncol(df3)){
  df3[is.na(df3[,i]), i] <- median(df3[,i], na.rm = TRUE)
}

However I am getting the error:

Error in median.default(df3[, i], na.rm = TRUE) : need numeric data

I am sure that there are missing values only in numeric column, why am I getting this error?

More importantly, how do I fill missing values in each column with respective column medians?

Upvotes: 0

Views: 517

Answers (1)

rdh
rdh

Reputation: 1045

Even if df3[is.na(df3[, i]), i] has zero rows, R still needs to calculate the RHS median(df3[,i], na.rm = TRUE). You could add a check to only replace missing values in numeric columns:

for(i in seq_along(df3)) {
  if (is.numeric(df3[, i])) {
    df3[is.na(df3[, i]), i] <- median(df3[, i], na.rm = TRUE)
  }
}

Upvotes: 1

Related Questions