AntVal
AntVal

Reputation: 665

Filter a dataset only to after first NA in column in R

I have a dataset where I am trying to filter to only those after the first non-NA observation, based on an ordering of dates.

mock.data <- data.frame( id = c(1, 1, 1, 1, 1,  
                            2, 2, 2, 2, 2,
                            3, 3, 3, 3, 3 ),
                     date = as.Date(c("1934-06-03", "1938-06-17",  "1943-06-23", "1948-06-17", "1953-06-23",
                              "1911-09-24", "1914-04-07", "1917-09-16", "1920-09-17", "1924-09-17",
                              "2008-09-09", "2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
                     price = c(33, 54, NA, 55, 67,
                               NA, NA, 19, NA, 22,
                               NA, 98, 87, 102, NA))

mock.data



 id       date price
  1 1934-06-03    33
  1 1938-06-17    54
  1 1943-06-23    NA
  1 1948-06-17    55
  1 1953-06-23    67
  2 1911-09-24    NA
  2 1914-04-07    NA
  2 1917-09-16    19
  2 1920-09-17    NA
  2 1924-09-17    22
  3 2008-09-09    NA
  3 2012-10-06    98
  3 2016-10-14    87
  3 2020-03-03   102
  3 2022-04-14    NA

And I want I want is to basically filter it to only those values after the first NA in price for each id, but keeping NA values that occur after the first non-NA. So, ideally, I would obtain the following:

ideal.data <- data.frame( id = c(1, 1, 1, 1, 1, 
                              2, 2, 2,
                             3, 3, 3,3 ),
                      date = as.Date(c("1934-06-03", "1938-06-17",  "1943-06-23", "1948-06-17", "1953-06-23",
                                       "1917-09-16", "1920-09-17", "1924-09-17",
                                       "2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
                      price = c(33, 54, NA, 55, 67,
                                19,NA, 22,
                                98, 87, 102, NA))

I've tried multiple things, mostly based on tidy, like the following:

library(tidyverse)


mock.data%>%
group_by(id)%>%
  arrange(date)%>%
  filter( date > date[min(is.na(price))])

But I'm getting a lot of errors and can't quite get what I am looking for. Any help is very welcomed!

Upvotes: 2

Views: 172

Answers (2)

akrun
akrun

Reputation: 886938

We can use cummax

library(dplyr)
mock.data %>%
   group_by(id) %>%
   filter(cummax(!is.na(price)) > 0) %>%
   ungroup

-output

# A tibble: 12 x 3
#      id date       price
#   <dbl> <date>     <dbl>
# 1     1 1934-06-03    33
# 2     1 1938-06-17    54
# 3     1 1943-06-23    NA
# 4     1 1948-06-17    55
# 5     1 1953-06-23    67
# 6     2 1917-09-16    19
# 7     2 1920-09-17    NA
# 8     2 1924-09-17    22
# 9     3 2012-10-06    98
#10     3 2016-10-14    87
#11     3 2020-03-03   102
#12     3 2022-04-14    NA

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

A base R option using ave

subset(
  mock.data,
  ave(!is.na(price), id, FUN = function(v) seq_along(v) >= which(v)[1])
)

gives

   id       date price
1   1 1934-06-03    33
2   1 1938-06-17    54
3   1 1943-06-23    NA
4   1 1948-06-17    55
5   1 1953-06-23    67
8   2 1917-09-16    19
9   2 1920-09-17    NA
10  2 1924-09-17    22
12  3 2012-10-06    98
13  3 2016-10-14    87
14  3 2020-03-03   102
15  3 2022-04-14    NA

Upvotes: 1

Related Questions