Carl
Carl

Reputation: 111

Fill up missing values with a factor value based on an ID variable

I would like to fill the <NA> with the correct factor value based on the ID variable.

Here are the variables:

ID <- c(1,1,1,2,2,2,3,3,3)
Gender_NA <- c("m",NA,"m",NA,"f",NA,"m","m",NA)
Gender  <- c("m","m","m","f","f","f","m","m","m")

Here are the data I have:

Data_have <- data.frame (ID,Gender_NA)

ID    Gender_NA
 1     m
 1    <NA>
 1     m
 2    <NA>
 2     f
 2    <NA>
 3     m
 3     m
 3    <NA>

Here are the data I whant to have:

Data_whant <- data.frame (ID,Gender)

ID Gender
1    m
1    m
1    m
2    f
2    f
2    f
3    m
3    m
3    m

I have tried to find the solution on this forum but I can´t get i to work.

Help would be much appreciated.

Upvotes: 2

Views: 868

Answers (2)

akrun
akrun

Reputation: 887911

The na.locf function from library(zoo) can be used for replacing the NA elements by the adjacent non-NA previous element. Using data.table, we convert the 'data.frame' to 'data.table', grouped by 'ID', we replace the NA elements by the previous non-NA, if the first element is NA, it will not be replaced, we can use a second na.locf with the option fromLast=TRUE to replace the remaining NA with the succeeding non-NA elements.

library(zoo)
library(data.table)
setDT(Data_have)[, Gender := na.locf(na.locf(Gender_NA, 
            na.rm=FALSE),fromLast=TRUE), by = ID][, Gender_NA := NULL]
Data_have
#    ID Gender
#1:  1      m
#2:  1      m
#3:  1      m
#4:  2      f
#5:  2      f
#6:  2      f
#7:  3      m
#8:  3      m
#9:  3      m

Or while grouping by ID, we can omit all NAs using na.omit() and pick the first element as follows:

setDT(Data_have)[, Gender := na.omit(Gender_NA)[1L], by =  ID][, Gender_NA := NULL]

Or using the same method with dplyr:

library(dplyr)
Data_have %>% 
     group_by(ID) %>%
     transmute(Gender= first(na.omit(Gender_NA)))
#    ID Gender
#   (dbl) (fctr)
#1     1      m
#2     1      m
#3     1      m
#4     2      f
#5     2      f
#6     2      f
#7     3      m
#8     3      m
#9     3      m

Upvotes: 2

Arun
Arun

Reputation: 118889

Here's how I'd do using data.table:

require(data.table) # v1.9.6+
dt = data.table(ID, Gender_NA)
# Gender_NA is of character type

And here's the answer:

dt[is.na(Gender_NA), Gender_NA := na.omit(dt)[.SD, Gender_NA, mult="first", on="ID"]]

Upvotes: 1

Related Questions