Reputation: 1219
I have a dataset that consists of groups with year, month, and day values. I want to filter the groups using tidyverse in R, such that I locate the latest month in the time series. Here is some example code.
dat = expand.grid(group = seq(1,5),year = seq(2016,2020),month=seq(1:12))
dat = dat[order(dat$group,dat$year,dat$month),]
dat$days=sample(seq(0,30),nrow(dat),replace=TRUE)
dat$year[dat$year==2020 & dat$month==12] = NA
dat = dat[complete.cases(dat),]
In this example, there are 5 groups with monthly data from 2016 - 2020. However, let's suppose group December is missing. Also, some days are missing in the dataset
I can grab December from 2019, but not sure how to include the days in the summary and filter by number of days in month. For example,
a = dat %>%
group_by(group,month) %>%
summarise(year = max(year))
gets the year, but I would like to add the correct days to the month and year. Does anyone know how to keep the days column? I don't want to average or get a minimum or anything.
Upvotes: 1
Views: 194
Reputation: 886938
We can use slice_max
to return the full row based on the max
value of 'year' for each grouping block
library(dplyr)
dat %>%
group_by(group, month) %>%
slice_max(year)
Upvotes: 1