user3664020
user3664020

Reputation: 3020

Replace NAs with mean of the same column of a data.table

I want to replace NAs present in a column of a DATA TABLE with the mean of the same column. I am doing the following. But it is not working.

ww <- data.table(iris)

ww <- ww[1:5 , ]

ww[1,1] <- NA

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:           NA         3.5          1.4         0.2  setosa
2:          4.9         3.0          1.4         0.2  setosa
3:          4.7         3.2          1.3         0.2  setosa
4:          4.6         3.1          1.5         0.2  setosa
5:          5.0         3.6          1.4         0.2  setosa


ww[is.na(Sepal.Length) , Sepal.Length:= mean(Sepal.Length, na.rm = T)]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:          NaN         3.5          1.4         0.2  setosa
2:          4.9         3.0          1.4         0.2  setosa
3:          4.7         3.2          1.3         0.2  setosa
4:          4.6         3.1          1.5         0.2  setosa
5:          5.0         3.6          1.4         0.2  setosa

Why am I getting NaN in place of NA when it should have been the mean of the rest of the values (4.9, 4.7, 4.6, 5.0)?

What is the alternate of acheiving this in case something is wrong with this syntax?

I want to the syntax for the data table.

Upvotes: 16

Views: 3241

Answers (7)

Edo
Edo

Reputation: 7818

just one line will do the trick with fcoalesce:

ww[, Sepal.Length := fcoalesce(Sepal.Length, mean(Sepal.Length, na.rm = TRUE))]
ww

#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <num>       <num>        <num>       <num>  <fctr>
#> 1:          4.8         3.5          1.4         0.2  setosa
#> 2:          4.9         3.0          1.4         0.2  setosa
#> 3:          4.7         3.2          1.3         0.2  setosa
#> 4:          4.6         3.1          1.5         0.2  setosa
#> 5:          5.0         3.6          1.4         0.2  setosa

Upvotes: 1

jangorecki
jangorecki

Reputation: 16697

While the zoo answer is pretty nice it requires new dependency.
Using just data.table you could do the following.

library(data.table)

# prepare data
ww = data.table(iris[1:5,])
ww[1, Sepal.Length := NA]

# solution
ww[, Sepal.Length.mean := mean(Sepal.Length, na.rm = TRUE) # calculate mean
   ][is.na(Sepal.Length), Sepal.Length := Sepal.Length.mean # replace NA with mean
     ][, Sepal.Length.mean := NULL # remove mean col
       ][] # just prints

While it may looks biggish comparing to zoo's, it is performance efficient as all steps are made using update by reference :=. It can also be easily tuned to replace NA with mean by group, just using by argument in data.table.

Upvotes: 13

phiver
phiver

Reputation: 23608

In base R:

ww$Sepal.Length[is.na(ww$Sepal.Length)] <- mean(ww$Sepal.Length, na.rm = T)

Upvotes: 6

jeremycg
jeremycg

Reputation: 24945

tidyr has a built in function, replace_na you can use for this:

library(tidyr)
ww %>% replace_na(list(Sepal.Length = mean(.$Sepal.Length, na.rm = TRUE)))

Upvotes: 3

Akhil Nair
Akhil Nair

Reputation: 3274

Your attempt subsetted the table first, selecting

> ww[is.na(Sepal.Length)]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:   

    NA         3.5          1.4         0.2  setosa

so any further operations can only 'see' these rows - i.e. Sepal.Length can only see that one NA.

The data.table solution you want is below - it looks at the whole table and replaces the NAs with the means using an ifelse.

ww[, Sepal.Length := ifelse(is.na(Sepal.Length), mean(Sepal.Length, na.rm = TRUE), Sepal.Length)]

Upvotes: 7

G. Grothendieck
G. Grothendieck

Reputation: 269421

na.aggregate in the zoo package replaces NAs with the mean of the non-NAs in the same column:

library(zoo)

ww[, Sepal.Length := na.aggregate(Sepal.Length)]

Upvotes: 21

pcantalupo
pcantalupo

Reputation: 2226

It is not taking the mean of the entire Sepal.Length column; only the 1 column that you have chosen.

Rather use:

ww[is.na(Sepal.Length) , Sepal.Length:= mean(ww$Sepal.Length, na.rm=TRUE)]

Upvotes: 4

Related Questions