Reputation: 1081
if i have the following dataframe:
tibble(date = c("2010_quarter_1","2010_quarter_2", "2011_end"), values = c(100,100, 2000))
How can i apply a filter condition so that when "quarter_1" and "quarter_2" both exist, then take the row that is quarter_2 only so the resulting dataframe looks like this:
tibble(date = c("2010_quarter_2", "2011_end"), values = c(100, 2000))
Is there a way to code this using filter() - for example if it became 2009_quarter_1 and 2009_quarter_2 was also present then we should take 2009_quarter_2.
But when only one is present e.g. 2009_quarter_1 then we do nothing
Additional question:
What happens if we have the following dataframe:
tibble(date = c("2010_quarter_1","2010_half", "2011_end"), values = c(100,100, 2000))
And for this condition - when 2010_quarter_1 and 2010_half both occur, we would like to take the values for 2010_half only, is there a way of doing this without having to hard code it in, e.g. if this occurs for 2011_quarter_1 and then 2011_half, it would be the same thing as to take 2011_half?
Upvotes: 1
Views: 50
Reputation: 942
I think the trick is to make actual dates from the date strings and then just keep the last available date (and also keep the year data)
dat <- tibble(date = c("2010_quarter_1","2010_quarter_2", "2011_end"), values = c(100,100, 2000))
dat %>%
mutate(quaterly_data = stringr::str_detect(date, "_quarter_")) %>%
mutate(updated_date = stringr::str_remove_all(date, "_quarter")) %>%
mutate(updated_date = stringr::str_replace(updated_date, "_end", "-12-31")) %>%
mutate(new_date = if_else(quaterly_data, lubridate::parse_date_time(updated_date, "%Y_%q"), lubridate::parse_date_time(updated_date, "%Y-%m-%d"))) %>%
mutate(year = lubridate::year(new_date)) %>%
group_by(year) %>%
filter(new_date == max(new_date) | !quaterly_data ) %>%
ungroup() %>%
select(date, values)
this code just keeps the last occurance per year, in the orde q1, q2, half, q3, q4, end
dat %>%
mutate(quaterly_data = stringr::str_detect(date, "_quarter_")) %>%
mutate(updated_date = stringr::str_remove_all(date, "_quarter")) %>%
mutate(updated_date = stringr::str_replace(updated_date, "_end", "-12-31")) %>%
mutate(updated_date = stringr::str_replace(updated_date, "_half", "-06-30")) %>%
mutate(new_date = if_else(quaterly_data,
lubridate::parse_date_time(updated_date, "%Y_%q"),
lubridate::parse_date_time(updated_date, "%Y-%m-%d"))) %>%
mutate(year = lubridate::year(new_date)) %>%
group_by(year) %>%
filter(new_date == max(new_date)) %>%
ungroup() %>%
select(date, values)
Upvotes: 1