Reputation: 462
I have a dataset dt
and I want to replace the NA
values with the mode of each attribute based on the id as follow:
Before:
id att
1 v
1 v
1 NA
1 c
2 c
2 v
2 NA
2 c
The outcome I am looking for is:
id att
1 v
1 v
1 v
1 c
2 c
2 v
2 c
2 c
I have done some attempts for example I found another similar question which wanted to replace the NA with mean (which has a built in function), therefore I tried to adjust the code as follow:
for (i in 1:dim(dt)[1]) {
if (is.na(dt$att[i])) {
att_mode <- # I am stuck here to return the mode of an attribute based on ID
dt$att[i] <- att_mode
}
}
I found the following function to calculate the mode
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
from the following link: Is there a built-in function for finding the mode?
But I have no idea how to apply it inside the for loop, I tried apply, ave functions but they do not seem to be the right choice because of the different dimensions.
Could anyone help on how to return the mode in my for loop?
Thank you
Upvotes: 3
Views: 1258
Reputation: 887691
We can use na.aggrgate
from library(zoo)
, specify the FUN
as Mode
. If this is a group by operation, we can do this using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', we apply the na.aggregate
library(data.table)
library(zoo)
setDT(df1)[, att:= na.aggregate(att, FUN=Mode), by = id]
df1
# id att
#1: 1 v
#2: 1 v
#3: 1 v
#4: 1 c
#5: 2 c
#6: 2 v
#7: 2 c
#8: 2 c
A similar option with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(att = na.aggregate(att, FUN=Mode))
NOTE: Mode
from OP's post. Also, assuming that the 'att' is character
class.
Upvotes: 2