Reputation: 976
I have the below dataframe and I need to ignore the missing value for the last value for id 9.
firstlast <- data.frame(id = as.factor(c("01", "01", "01", "01", "01", "04", "04", "05", "05", "05", "05", "05", "09", "09", "09", "09", "09")),
var_a = c(13, 21, 32, 33, 44, 21, 33, 35, 17, 18, 21, 22, 17, 13, 33, 32, NA))
The df looks like:
id var_a
1 01 13
2 01 21
3 01 32
4 01 33
5 01 44
6 04 21
7 04 33
8 05 35
9 05 17
10 05 18
11 05 21
12 05 22
13 09 17
14 09 13
15 09 33
16 09 32
17 09 NA
My attempt thus far:
firstlast <- firstlast %>% group_by(id) %>%
mutate(var_first = first(var_a)) %>%
mutate(var_last = last(var_a)) %>%
mutate(change = var_last - var_first)
creates:
id var_a var_first var_last change
(fctr) (dbl) (dbl) (dbl) (dbl)
1 01 13 13 44 31
2 01 21 13 44 31
3 01 32 13 44 31
4 01 33 13 44 31
5 01 44 13 44 31
6 04 21 21 33 12
7 04 33 21 33 12
8 05 35 35 22 -13
9 05 17 35 22 -13
10 05 18 35 22 -13
11 05 21 35 22 -13
12 05 22 35 22 -13
13 09 17 17 NA NA
14 09 13 17 NA NA
15 09 33 17 NA NA
16 09 32 17 NA NA
17 09 NA 17 NA NA
However I need to ignore the missing value for the last NA observation for id 9 and instead use the second to last (or the first observed value). When I include mutate(var_last = last(var_a), na.rm=TRUE)
at that line, I then get a full column of na.rm = TRUE and not ignoring the NA value.
The end df I am hoping to achieve is...
id var_a var_first var_last change
(fctr) (dbl) (dbl) (dbl) (dbl)
1 01 13 13 44 31
2 01 21 13 44 31
3 01 32 13 44 31
4 01 33 13 44 31
5 01 44 13 44 31
6 04 21 21 33 12
7 04 33 21 33 12
8 05 35 35 22 -13
9 05 17 35 22 -13
10 05 18 35 22 -13
11 05 21 35 22 -13
12 05 22 35 22 -13
13 09 17 17 32 15
14 09 13 17 32 15
15 09 33 17 32 15
16 09 32 17 32 15
17 09 NA 17 32 15
Thank you!
Upvotes: 6
Views: 3887
Reputation: 51914
Since dplyr 1.1.0
, first
, last
and nth
include a na_rm
argument:
library(dplyr)
firstlast %>%
mutate(var_first = first(var_a, na_rm = TRUE),
var_last = last(var_a, na_rm = TRUE),
change = var_last - var_first,
.by = id)
Upvotes: 1
Reputation: 886938
Here is an option using data.table
library(data.table)
setDT(firstlast)[, c("var_first", "var_last") := .(var_a[!is.na(var_a)][1],
tail(var_a[!is.na(var_a)], 1)),
, by = id][, change := var_last - var_first][]
# id var_a var_first var_last change
# 1: 01 13 13 44 31
# 2: 01 21 13 44 31
# 3: 01 32 13 44 31
# 4: 01 33 13 44 31
# 5: 01 44 13 44 31
# 6: 04 21 21 33 12
# 7: 04 33 21 33 12
# 8: 05 35 35 22 -13
# 9: 05 17 35 22 -13
#10: 05 18 35 22 -13
#11: 05 21 35 22 -13
#12: 05 22 35 22 -13
#13: 09 17 17 32 15
#14: 09 13 17 32 15
#15: 09 33 17 32 15
#16: 09 32 17 32 15
#17: 09 NA 17 32 15
Upvotes: 1
Reputation: 145755
mutate
doesn't have an na.rm
option, neither does first()
or last()
which would be more appropriate here. You can omit missing values yourself with na.omit
:
firstlast <- firstlast %>% group_by(id) %>%
mutate(
var_first = first(na.omit(var_a)),
var_last = last(na.omit(var_a)),
change = var_last - var_first
)
# Source: local data frame [17 x 5]
# Groups: id [4]
#
# id var_a var_first var_last change
# (fctr) (dbl) (dbl) (dbl) (dbl)
# 1 01 13 13 44 31
# 2 01 21 13 44 31
# 3 01 32 13 44 31
# 4 01 33 13 44 31
# 5 01 44 13 44 31
# 6 04 21 21 33 12
# 7 04 33 21 33 12
# 8 05 35 35 22 -13
# 9 05 17 35 22 -13
# 10 05 18 35 22 -13
# 11 05 21 35 22 -13
# 12 05 22 35 22 -13
# 13 09 17 17 32 15
# 14 09 13 17 32 15
# 15 09 33 17 32 15
# 16 09 32 17 32 15
# 17 09 NA 17 32 15
As a sidenote, giving mutate
an na.rm
argument doesn't make much sense. What NA
values would you omit? Any rows from any columns used in the calculation? Other inputs? Just the result? How would you fill the removed rows to still have the same values? It's much better to be explicit as above.
Upvotes: 7