Shaher Jamal Eddin
Shaher Jamal Eddin

Reputation: 462

Replace NA with mode based on ID attribute

I have a dataset dt and I want to replace the NA values with the mode of each attribute based on the id as follow:

Before:

 id  att  
  1  v
  1  v
  1  NA
  1  c
  2  c
  2  v
  2  NA
  2  c

The outcome I am looking for is:

 id  att
  1  v
  1  v
  1  v
  1  c
  2  c
  2  v
  2  c
  2  c

I have done some attempts for example I found another similar question which wanted to replace the NA with mean (which has a built in function), therefore I tried to adjust the code as follow:

for (i in 1:dim(dt)[1]) {
    if (is.na(dt$att[i])) {
      att_mode <-                  # I am stuck here to return the mode of an attribute based on ID
      dt$att[i] <- att_mode 
    }
  }

I found the following function to calculate the mode

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

from the following link: Is there a built-in function for finding the mode?

But I have no idea how to apply it inside the for loop, I tried apply, ave functions but they do not seem to be the right choice because of the different dimensions.

Could anyone help on how to return the mode in my for loop?

Thank you

Upvotes: 3

Views: 1258

Answers (1)

akrun
akrun

Reputation: 887691

We can use na.aggrgate from library(zoo), specify the FUN as Mode. If this is a group by operation, we can do this using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we apply the na.aggregate

library(data.table)
library(zoo)
setDT(df1)[, att:= na.aggregate(att, FUN=Mode), by = id]
df1
#    id att
#1:  1   v
#2:  1   v
#3:  1   v
#4:  1   c
#5:  2   c
#6:  2   v
#7:  2   c
#8:  2   c

A similar option with dplyr

library(dplyr)
df1 %>%
     group_by(id) %>%
     mutate(att = na.aggregate(att, FUN=Mode))

NOTE: Mode from OP's post. Also, assuming that the 'att' is character class.

Upvotes: 2

Related Questions