Reputation: 665
I have a dataset where I am trying to filter to only those after the first non-NA observation, based on an ordering of dates.
mock.data <- data.frame( id = c(1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
3, 3, 3, 3, 3 ),
date = as.Date(c("1934-06-03", "1938-06-17", "1943-06-23", "1948-06-17", "1953-06-23",
"1911-09-24", "1914-04-07", "1917-09-16", "1920-09-17", "1924-09-17",
"2008-09-09", "2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
price = c(33, 54, NA, 55, 67,
NA, NA, 19, NA, 22,
NA, 98, 87, 102, NA))
mock.data
id date price
1 1934-06-03 33
1 1938-06-17 54
1 1943-06-23 NA
1 1948-06-17 55
1 1953-06-23 67
2 1911-09-24 NA
2 1914-04-07 NA
2 1917-09-16 19
2 1920-09-17 NA
2 1924-09-17 22
3 2008-09-09 NA
3 2012-10-06 98
3 2016-10-14 87
3 2020-03-03 102
3 2022-04-14 NA
And I want I want is to basically filter it to only those values after the first NA
in price
for each id
, but keeping NA
values that occur after the first non-NA. So, ideally, I would obtain the following:
ideal.data <- data.frame( id = c(1, 1, 1, 1, 1,
2, 2, 2,
3, 3, 3,3 ),
date = as.Date(c("1934-06-03", "1938-06-17", "1943-06-23", "1948-06-17", "1953-06-23",
"1917-09-16", "1920-09-17", "1924-09-17",
"2012-10-06", "2016-10-14", "2020-03-03", "2022-04-14")),
price = c(33, 54, NA, 55, 67,
19,NA, 22,
98, 87, 102, NA))
I've tried multiple things, mostly based on tidy, like the following:
library(tidyverse)
mock.data%>%
group_by(id)%>%
arrange(date)%>%
filter( date > date[min(is.na(price))])
But I'm getting a lot of errors and can't quite get what I am looking for. Any help is very welcomed!
Upvotes: 2
Views: 172
Reputation: 886938
We can use cummax
library(dplyr)
mock.data %>%
group_by(id) %>%
filter(cummax(!is.na(price)) > 0) %>%
ungroup
-output
# A tibble: 12 x 3
# id date price
# <dbl> <date> <dbl>
# 1 1 1934-06-03 33
# 2 1 1938-06-17 54
# 3 1 1943-06-23 NA
# 4 1 1948-06-17 55
# 5 1 1953-06-23 67
# 6 2 1917-09-16 19
# 7 2 1920-09-17 NA
# 8 2 1924-09-17 22
# 9 3 2012-10-06 98
#10 3 2016-10-14 87
#11 3 2020-03-03 102
#12 3 2022-04-14 NA
Upvotes: 1
Reputation: 101064
A base R option using ave
subset(
mock.data,
ave(!is.na(price), id, FUN = function(v) seq_along(v) >= which(v)[1])
)
gives
id date price
1 1 1934-06-03 33
2 1 1938-06-17 54
3 1 1943-06-23 NA
4 1 1948-06-17 55
5 1 1953-06-23 67
8 2 1917-09-16 19
9 2 1920-09-17 NA
10 2 1924-09-17 22
12 3 2012-10-06 98
13 3 2016-10-14 87
14 3 2020-03-03 102
15 3 2022-04-14 NA
Upvotes: 1