Damo H
Damo H

Reputation: 77

Replacing NA Values in a data frame with the median using a for loop

I have been trying to work out how to use a for loop to replace NA values in certain columns with the median of the column. So far I have this:

for (i in 1:ncol(merged_df_edit3)){
  if(is.na(merged_df_edit3[,i]) == TRUE){
    assign(merged_df_edit3[,i],replace_na(median(merged_df_edit3[,i])))
  }

}

this works and runs, but gives of the warning:

"In if (is.na(merged_df_edit3[, i]) == TRUE) { ... : the condition has length > 1 and only the first element will be used"

however, when I check the data frame it hasn't replaced any values at all.

The data I am using is mixed between numeric, date and character as is like this. There are some blanks in the character columns but I do no need them filled.

df <- tribble(
  ~`date Column`,   ~`Numeric Column`,  ~`Character Column`,
  "1/1/2011",   123,    "Left",
  "1/2/2011",   124,    "Right",
  "1/3/2011",   125,    "Left",
  "1/4/2011",   NA,   "NA",
  "1/5/2011",   132,    "Right"
)

Thanks!

Upvotes: 2

Views: 767

Answers (3)

Waldi
Waldi

Reputation: 41220

Instead of a loop, you could use dplyr which will probably be a bit more efficient:

library(dplyr)

df <- df %>%
      mutate(across(where(is.numeric),function(x) {if_else(is.na(x),median(x,na.rm=T),x)}))

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

There is need for only one for loop and one if condition.

for(i in 1:ncol(df)){
  if(is.numeric(df[[i]])){
    na <- is.na(df[[i]])
    df[na, i] <- median(df[[i]], na.rm = TRUE)
  }
}

Upvotes: 3

Anoushiravan R
Anoushiravan R

Reputation: 21908

If you insist on using a for loop here is a solution that might help you. It should be noted that I first check whether the column is numeric and then iterate over its rows to find the NA values.

df <- tribble(
  ~`date Column`,   ~`Numeric Column`,  ~`Character Column`,
  "1/1/2011",   123,    "Left",
  "1/2/2011",   124,    "Right",
  "1/3/2011",   125,    "Left",
  "1/4/2011",   NA,   "NA",
  "1/5/2011",   132,    "Right"
)

for(j in 1:ncol(df)) {
  if(is.numeric(df[[j]])) {
    for(i in 1:nrow(df)) {
      if(is.na(df[i, j])) {
        df[i, j] <- median(df[[j]], na.rm = TRUE)
      }
    }
  }
}
df

# A tibble: 5 x 3
  `date Column` `Numeric Column` `Character Column`
  <chr>                    <dbl> <chr>             
1 1/1/2011                  123  Left              
2 1/2/2011                  124  Right             
3 1/3/2011                  125  Left              
4 1/4/2011                  124. NA                
5 1/5/2011                  132  Right 

The 4th element of Numeric Column has been replace by the median of that column.

Upvotes: 1

Related Questions