Edu
Edu

Reputation: 903

Recoding missing data in longitudinal data frames with R

I have a data frame with a similar longitudinal structure as data:

data = data.frame (
   ID = c("a","a","a","b","b","b","c","c", "c"), 
   period = c(1,2,3,1,2,3,1,2,3),
   size = c(3,3,NA, NA, NA,1, 14,14, 14))

The values of the variable size are fixed so that each period has the same value for size. Yet some observations have missing values. My aim consists of replacing these missing values with the value of size associated with the periods where there is no missing (e.g. 3 for ID "a" and 1 for ID "b").

The desired data frame should look something similar to:

data.1 

    ID period value
      a      1     3
      a      2     3
      a      3     3
      b      1     1
      b      2     1
      b      3     1
      c      1    14
      c      2    14
      c      3    14

I have tried different combinations of the formula below but I don't get the result I am looking for.

library(dplyr)

data.1 = data %>% group_by(ID) %>% 
  mutate(new.size = ifelse(is.na(size), !is.na(size),
                          ifelse(!is.na(size), size, 0)))

That yields the following:

data.1
Source: local data frame [9 x 4]
Groups: ID [3]

      ID period  size new.size
  (fctr)  (dbl) (dbl)    (dbl)
1      a      1     3        3
2      a      2     3        3
3      a      3    NA        0
4      b      1    NA        0
5      b      2    NA        0
6      b      3     1        1
7      c      1    14       14
8      c      2    14       14
9      c      3    14       14

I would be grateful if someone could give me a hint on how to get the right solution.

Upvotes: 1

Views: 328

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389095

To correct your code, you can try the following with dplyr

library(dplyr)
data %>% group_by(ID) %>% 
         mutate(new.size = ifelse(is.na(size), size[!is.na(size)],size))

#      ID   period  size new.size
#     (fctr)  (dbl) (dbl)    (dbl)
#1      a      1     3        3
#2      a      2     3        3
#3      a      3    NA        3
#4      b      1    NA        1
#5      b      2    NA        1
#6      b      3     1        1
#7      c      1    14       14
#8      c      2    14       14
#9      c      3    14       14

Or a base R alternative with ave

data$new.size <- ave(data$size,data$ID, FUN=function(x)unique(x[!is.na(x)]))
data$new.size

#[1]  3  3  3  1  1  1 14 14 14

Upvotes: 3

Mamoun Benghezal
Mamoun Benghezal

Reputation: 5314

here another solution using dplyr with na.omit

group_by(data, ID) %>%
     mutate(value=na.omit(size)[1])
Source: local data frame [9 x 4]
Groups: ID [3]

      ID period  size value
  <fctr>  <dbl> <dbl> <dbl>
1      a      1     3     3
2      a      2     3     3
3      a      3    NA     3
4      b      1    NA     1
5      b      2    NA     1
6      b      3     1     1
7      c      1    14    14
8      c      2    14    14
9      c      3    14    14

note that you can replace na.omit with max(size, na.rm=TRUE) if you are looking for maximum for example.

Upvotes: 3

Raad
Raad

Reputation: 2715

How about this with base R:

vals <- unique(na.omit(data[, c("ID", "size")]))
data$size <- vals$size[match(data$ID, vals$ID)]


  ID period size
1  a      1    3
2  a      2    3
3  a      3    3
4  b      1    1
5  b      2    1
6  b      3    1
7  c      1   14
8  c      2   14
9  c      3   14

Upvotes: 3

Related Questions