Beans On Toast
Beans On Toast

Reputation: 1081

Taking the more recent row of values in R?

if i have the following dataframe:

tibble(date = c("2010_quarter_1","2010_quarter_2", "2011_end"), values = c(100,100, 2000))

How can i apply a filter condition so that when "quarter_1" and "quarter_2" both exist, then take the row that is quarter_2 only so the resulting dataframe looks like this:

tibble(date = c("2010_quarter_2", "2011_end"), values = c(100, 2000))

Is there a way to code this using filter() - for example if it became 2009_quarter_1 and 2009_quarter_2 was also present then we should take 2009_quarter_2.

But when only one is present e.g. 2009_quarter_1 then we do nothing

Additional question:

What happens if we have the following dataframe:

tibble(date = c("2010_quarter_1","2010_half", "2011_end"), values = c(100,100, 2000))

And for this condition - when 2010_quarter_1 and 2010_half both occur, we would like to take the values for 2010_half only, is there a way of doing this without having to hard code it in, e.g. if this occurs for 2011_quarter_1 and then 2011_half, it would be the same thing as to take 2011_half?

Upvotes: 1

Views: 50

Answers (1)

L Smeets
L Smeets

Reputation: 942

I think the trick is to make actual dates from the date strings and then just keep the last available date (and also keep the year data)

dat <- tibble(date = c("2010_quarter_1","2010_quarter_2", "2011_end"), values = c(100,100, 2000))

dat %>% 
  mutate(quaterly_data = stringr::str_detect(date,  "_quarter_")) %>% 
  mutate(updated_date = stringr::str_remove_all(date, "_quarter")) %>% 
  mutate(updated_date = stringr::str_replace(updated_date, "_end", "-12-31")) %>% 
  mutate(new_date = if_else(quaterly_data, lubridate::parse_date_time(updated_date, "%Y_%q"),  lubridate::parse_date_time(updated_date, "%Y-%m-%d"))) %>% 
  mutate(year = lubridate::year(new_date)) %>% 
  group_by(year) %>% 
  filter(new_date == max(new_date) | !quaterly_data ) %>% 
  ungroup() %>% 
  select(date, values)

updated answer

this code just keeps the last occurance per year, in the orde q1, q2, half, q3, q4, end

dat %>% 
  mutate(quaterly_data = stringr::str_detect(date,  "_quarter_")) %>% 
  mutate(updated_date = stringr::str_remove_all(date, "_quarter")) %>% 
  mutate(updated_date = stringr::str_replace(updated_date, "_end", "-12-31")) %>% 
  mutate(updated_date = stringr::str_replace(updated_date, "_half", "-06-30")) %>% 
  mutate(new_date = if_else(quaterly_data,
                            lubridate::parse_date_time(updated_date, "%Y_%q"), 
                            lubridate::parse_date_time(updated_date, "%Y-%m-%d"))) %>% 
  mutate(year = lubridate::year(new_date)) %>% 
  group_by(year) %>% 
  filter(new_date == max(new_date)) %>% 
  ungroup() %>% 
  select(date, values)

Upvotes: 1

Related Questions