Reputation: 191
I want to find the non-missing value of each group associated with the largest index value, for many columns.
I have gotten fairly close by using summarize_all with which.max but I am not sure how to remove the NAs from each vector before I find the latest value. I read about using na.rm in summarize_all with functions like mean but not sure how to incorporate similar functionality without a built in function. I have tried na.omit but it doesnt provide the solution I'm looking for.
a <- head(iris, 10)
a$num <- 1:10
a$grp <- c("a","a","a","b","b","c","c","d","d","d")
a[10, "Species"] <- NA
a %>%
group_by(grp) %>%
summarize_all(funs(na.omit(.)[which.max(num)]))
grp Sepal.Length Sepal.Width Petal.Length Petal.Width Species num
<chr> <dbl> <dbl> <dbl> <dbl> <fct> <int>
1 a 4.70 3.20 1.30 0.200 setosa 3
2 b 5.00 3.60 1.40 0.200 setosa 5
3 c 4.60 3.40 1.40 0.300 setosa 7
4 d 4.90 3.10 1.50 0.100 NA 10
I expect all the values in the Species column to be setosa, however the last value is NA.
Upvotes: 2
Views: 112
Reputation: 48231
Instead of looking at all num
, we may look only at those where the corresponding variable is not NA
:
a %>%
group_by(grp) %>%
summarize_all(funs(na.omit(.)[which.max(num[!is.na(.)])]))
# A tibble: 4 x 7
# grp Sepal.Length Sepal.Width Petal.Length Petal.Width Species num
# <chr> <dbl> <dbl> <dbl> <dbl> <fct> <int>
# 1 a 4.7 3.2 1.3 0.2 setosa 3
# 2 b 5 3.6 1.4 0.2 setosa 5
# 3 c 4.6 3.4 1.4 0.3 setosa 7
# 4 d 4.9 3.1 1.5 0.1 setosa 10
Upvotes: 1
Reputation: 7312
You could also approach this a little differently and complete the NA
case first:
library(tidyverse)
a %>% group_by(grp) %>%
fill(Species) %>%
filter(num == max(num))
tibble: 4 x 7
# Groups: grp [4]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species num grp
<dbl> <dbl> <dbl> <dbl> <fct> <int> <chr>
1 4.7 3.2 1.3 0.2 setosa 3 a
2 5 3.6 1.4 0.2 setosa 5 b
3 4.6 3.4 1.4 0.3 setosa 7 c
4 4.9 3.1 1.5 0.1 setosa 10 d
Upvotes: 1
Reputation: 329
if you use a data.table approach, you can try:
library (data.table)
a = data.table (a)
a [is.finite (Species), by = grp, .SD [which.max (num) ] ]
Upvotes: 1