Reputation: 77
I have been trying to work out how to use a for loop to replace NA values in certain columns with the median of the column. So far I have this:
for (i in 1:ncol(merged_df_edit3)){
if(is.na(merged_df_edit3[,i]) == TRUE){
assign(merged_df_edit3[,i],replace_na(median(merged_df_edit3[,i])))
}
}
this works and runs, but gives of the warning:
"In if (is.na(merged_df_edit3[, i]) == TRUE) { ... : the condition has length > 1 and only the first element will be used"
however, when I check the data frame it hasn't replaced any values at all.
The data I am using is mixed between numeric, date and character as is like this. There are some blanks in the character columns but I do no need them filled.
df <- tribble(
~`date Column`, ~`Numeric Column`, ~`Character Column`,
"1/1/2011", 123, "Left",
"1/2/2011", 124, "Right",
"1/3/2011", 125, "Left",
"1/4/2011", NA, "NA",
"1/5/2011", 132, "Right"
)
Thanks!
Upvotes: 2
Views: 767
Reputation: 41220
Instead of a loop, you could use dplyr
which will probably be a bit more efficient:
library(dplyr)
df <- df %>%
mutate(across(where(is.numeric),function(x) {if_else(is.na(x),median(x,na.rm=T),x)}))
Upvotes: 1
Reputation: 76402
There is need for only one for
loop and one if
condition.
for(i in 1:ncol(df)){
if(is.numeric(df[[i]])){
na <- is.na(df[[i]])
df[na, i] <- median(df[[i]], na.rm = TRUE)
}
}
Upvotes: 3
Reputation: 21908
If you insist on using a for
loop here is a solution that might help you. It should be noted that I first check whether the column is numeric and then iterate over its rows to find the NA
values.
df <- tribble(
~`date Column`, ~`Numeric Column`, ~`Character Column`,
"1/1/2011", 123, "Left",
"1/2/2011", 124, "Right",
"1/3/2011", 125, "Left",
"1/4/2011", NA, "NA",
"1/5/2011", 132, "Right"
)
for(j in 1:ncol(df)) {
if(is.numeric(df[[j]])) {
for(i in 1:nrow(df)) {
if(is.na(df[i, j])) {
df[i, j] <- median(df[[j]], na.rm = TRUE)
}
}
}
}
df
# A tibble: 5 x 3
`date Column` `Numeric Column` `Character Column`
<chr> <dbl> <chr>
1 1/1/2011 123 Left
2 1/2/2011 124 Right
3 1/3/2011 125 Left
4 1/4/2011 124. NA
5 1/5/2011 132 Right
The 4th element of Numeric Column
has been replace by the median of that column.
Upvotes: 1