Soma
Soma

Reputation: 329

How can I use na.locf() for factor 'inputs'?

I want to fill NAs of my dataset in group variables according to the values of the group in previous years of that ID itself. the na.locf(newData, na.rm = TRUE) part of code does not work. I think it is because the input is not a number. Or is it another thing? Does anyone know how to fix this problem?

  for (i in my_data$ID){
    newData = my_data[my_data$ID==i,c('ID','Year', 'group')][3]
    na.locf(newData,na.rm = TRUE)

  } 

my dataset is very big. but I provide this as a sample of what I need:

structure(list(ID = c(1L, 2L, 3L, 1L, 1L, 1L), Year = c(2000L, 
2000L, 2001L, 2001L, 2002L, 2003L), Group = structure(c(2L, 3L, 
2L, 1L, 1L, 4L), .Label = c("", "\"A\"", "\"B\"", "\"C\""), class = "factor")), row.names = c(NA, 
6L), class = "data.frame")

the result should be like this:

structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L), Year = c(2000L, 
2001L, 2002L, 2003L, 2000L, 2002L), Group = structure(c(1L, 1L, 
1L, 3L, 2L, 2L), .Label = c("\"A\"", "\"B\"", "\"C\""), class = "factor")), row.names = c(NA, 
6L), class = "data.frame")

Upvotes: 2

Views: 153

Answers (2)

hello_friend
hello_friend

Reputation: 5788

Base R, using @Sotos with/replace/ave logic:

df$Group <- with(replace(df, df == '', NA),
                  ave(Group, ID, FUN = function(x){na.omit(x)[cumsum(!is.na(x))]}))

Data:

    df <- structure(
  list(
    ID = c(1L, 2L, 3L, 1L, 1L, 1L),
    Year = c(2000L,
             2000L, 2001L, 2001L, 2002L, 2003L),
    Group = structure(
      c(2L, 3L,
        2L, 1L, 1L, 4L),
      .Label = c("", "\"A\"", "\"B\"", "\"C\""),
      class = "factor"
    )
  ),
  row.names = c(NA,
                6L),
  class = "data.frame"
)

Upvotes: 1

Sotos
Sotos

Reputation: 51582

So as I said, your problem was simply that you had to replace empties with NA.

with(replace(df, df == '', NA), ave(Group, ID, FUN = zoo::na.locf))
#[1] "A" "B" "A" "A" "A" "C"

Attaching it back to your df,

df$Group <- with(replace(df, df == '', NA), ave(Group, ID, FUN = zoo::na.locf))

which gives,

  ID Year Group
1  1 2000   "A"
2  2 2000   "B"
3  3 2001   "A"
4  1 2001   "A"
5  1 2002   "A"
6  1 2003   "C"

Upvotes: 5

Related Questions