sebastian.klotz
sebastian.klotz

Reputation: 125

Replace NA by max value in column in a list of data frames

I have two data frames and two questions. In both data frames df1 and df2, I can replace the NAs by 0.

df1

 var1 <- c(1, NA, 2, NA, 4, 5, 5)
 var2 <- c(1, 2, 3, 4, 5, 6, 7)
 df1 <- data.frame(var1, var2)
 df1$var1[is.na(df1$var1)] <- 0

df2

 var1 <- c(1, NA, 2, NA, 4, 5, 9)
 var2 <- c(1, 2, 3, 4, 5, 6, 7)
 df2 <- data.frame(var1, var2)
 df2$var1[is.na(df1$var1)] <- 0

But how would this work if I wanted to replace the NAs by the maximum value of var1 rather than 0? I thought it would be the following but it does not work.

 df1$var1[is.na(df1$var1)] <- max(df1$var1)

Once this is solved, I would actually like to do this for a list of data frames using lapply.

 mylist <- list(df1, df2)

My idea was something like the following - which does not work either.

lapply(mylist, function(x) x$var1[is.na(x$var1)] <- max(x$var1))

Many thanks for your help!

Upvotes: 2

Views: 1452

Answers (1)

IRTFM
IRTFM

Reputation: 263332

Need to use na.rm=TRUE in max:

>  df1$var1[is.na(df1$var1)] <- max(df1$var1, na.rm=TRUE)
> 
> 
>  var1 <- c(1, NA, 2, NA, 4, 5, 9)
>  var2 <- c(1, 2, 3, 4, 5, 6, 7)
>  df2 <- data.frame(var1, var2)
>  df2$var1[is.na(df1$var1)] <-  max(df2$var1, na.rm=TRUE)
> df1
  var1 var2
1    1    1
2    5    2
3    2    3
4    5    4
5    4    5
6    5    6
7    5    7
> df2
  var1 var2
1    1    1
2   NA    2
3    2    3
4   NA    4
5    4    5
6    5    6
7    9    7

You attempt with the lapply missed the fact that you would need to make the modified dataframe the last object evaluated. The results of [<- is just the item and not the full dataframe:

lapply(mylist, function(x) {x$var1[is.na(x$var1)] <- max(x$var1, na.rm=TRUE); x})

Upvotes: 3

Related Questions