Reputation: 976

Using dplyr first and last but ignoring NA values

I have the below dataframe and I need to ignore the missing value for the last value for id 9.

firstlast <- data.frame(id = as.factor(c("01", "01", "01", "01", "01", "04", "04", "05", "05", "05", "05", "05", "09", "09", "09", "09", "09")),
                 var_a = c(13, 21, 32, 33, 44, 21, 33, 35, 17, 18, 21, 22, 17, 13, 33, 32, NA))

The df looks like:

   id var_a
1  01    13
2  01    21
3  01    32
4  01    33
5  01    44
6  04    21
7  04    33
8  05    35
9  05    17
10 05    18
11 05    21
12 05    22
13 09    17
14 09    13
15 09    33
16 09    32
17 09    NA

My attempt thus far:

firstlast <- firstlast %>% group_by(id) %>%
  mutate(var_first = first(var_a)) %>%
  mutate(var_last = last(var_a)) %>%
  mutate(change = var_last - var_first)

creates:

       id var_a var_first var_last change
   (fctr) (dbl)     (dbl)    (dbl)  (dbl)
1      01    13        13       44     31
2      01    21        13       44     31
3      01    32        13       44     31
4      01    33        13       44     31
5      01    44        13       44     31
6      04    21        21       33     12
7      04    33        21       33     12
8      05    35        35       22    -13
9      05    17        35       22    -13
10     05    18        35       22    -13
11     05    21        35       22    -13
12     05    22        35       22    -13
13     09    17        17       NA     NA
14     09    13        17       NA     NA
15     09    33        17       NA     NA
16     09    32        17       NA     NA
17     09    NA        17       NA     NA

However I need to ignore the missing value for the last NA observation for id 9 and instead use the second to last (or the first observed value). When I include mutate(var_last = last(var_a), na.rm=TRUE) at that line, I then get a full column of na.rm = TRUE and not ignoring the NA value.

The end df I am hoping to achieve is...

       id var_a var_first var_last change
   (fctr) (dbl)     (dbl)    (dbl)  (dbl)
1      01    13        13       44     31
2      01    21        13       44     31
3      01    32        13       44     31
4      01    33        13       44     31
5      01    44        13       44     31
6      04    21        21       33     12
7      04    33        21       33     12
8      05    35        35       22    -13
9      05    17        35       22    -13
10     05    18        35       22    -13
11     05    21        35       22    -13
12     05    22        35       22    -13
13     09    17        17       32     15
14     09    13        17       32     15
15     09    33        17       32     15
16     09    32        17       32     15
17     09    NA        17       32     15

Thank you!

Upvotes: 6

Answers (3)

Maël

Reputation: 51914

Since dplyr 1.1.0, first, last and nth include a na_rm argument:

library(dplyr)
firstlast %>% 
  mutate(var_first = first(var_a, na_rm = TRUE),
         var_last = last(var_a, na_rm = TRUE),
         change = var_last - var_first,
         .by = id)

Upvotes: 1

akrun

Reputation: 886938

Here is an option using data.table

 library(data.table)
 setDT(firstlast)[,  c("var_first", "var_last") := .(var_a[!is.na(var_a)][1], 
       tail(var_a[!is.na(var_a)], 1)), 
                , by = id][, change := var_last - var_first][]
#     id var_a var_first var_last change
# 1: 01    13        13       44     31
# 2: 01    21        13       44     31
# 3: 01    32        13       44     31
# 4: 01    33        13       44     31
# 5: 01    44        13       44     31
# 6: 04    21        21       33     12
# 7: 04    33        21       33     12
# 8: 05    35        35       22    -13
# 9: 05    17        35       22    -13
#10: 05    18        35       22    -13
#11: 05    21        35       22    -13
#12: 05    22        35       22    -13
#13: 09    17        17       32     15
#14: 09    13        17       32     15
#15: 09    33        17       32     15
#16: 09    32        17       32     15
#17: 09    NA        17       32     15

Upvotes: 1

Gregor Thomas

Reputation: 145755

mutate doesn't have an na.rm option, neither does first() or last() which would be more appropriate here. You can omit missing values yourself with na.omit:

firstlast <- firstlast %>% group_by(id) %>%
    mutate(
        var_first = first(na.omit(var_a)),
        var_last = last(na.omit(var_a)),
        change = var_last - var_first
    )
# Source: local data frame [17 x 5]
# Groups: id [4]
# 
#        id var_a var_first var_last change
#    (fctr) (dbl)     (dbl)    (dbl)  (dbl)
# 1      01    13        13       44     31
# 2      01    21        13       44     31
# 3      01    32        13       44     31
# 4      01    33        13       44     31
# 5      01    44        13       44     31
# 6      04    21        21       33     12
# 7      04    33        21       33     12
# 8      05    35        35       22    -13
# 9      05    17        35       22    -13
# 10     05    18        35       22    -13
# 11     05    21        35       22    -13
# 12     05    22        35       22    -13
# 13     09    17        17       32     15
# 14     09    13        17       32     15
# 15     09    33        17       32     15
# 16     09    32        17       32     15
# 17     09    NA        17       32     15

As a sidenote, giving mutate an na.rm argument doesn't make much sense. What NA values would you omit? Any rows from any columns used in the calculation? Other inputs? Just the result? How would you fill the removed rows to still have the same values? It's much better to be explicit as above.

Upvotes: 7

Using dplyr first and last but ignoring NA values

Answers (3)

Related Questions