user44796
user44796

Reputation: 1219

keep columns after summarising using tidyverse in R

I have a dataset that consists of groups with year, month, and day values. I want to filter the groups using tidyverse in R, such that I locate the latest month in the time series. Here is some example code.

dat = expand.grid(group = seq(1,5),year = seq(2016,2020),month=seq(1:12))
dat = dat[order(dat$group,dat$year,dat$month),]
dat$days=sample(seq(0,30),nrow(dat),replace=TRUE)
dat$year[dat$year==2020 & dat$month==12] = NA
dat = dat[complete.cases(dat),]

In this example, there are 5 groups with monthly data from 2016 - 2020. However, let's suppose group December is missing. Also, some days are missing in the dataset

I can grab December from 2019, but not sure how to include the days in the summary and filter by number of days in month. For example,

a = dat %>%
  group_by(group,month) %>%
  summarise(year = max(year))

gets the year, but I would like to add the correct days to the month and year. Does anyone know how to keep the days column? I don't want to average or get a minimum or anything.

Upvotes: 1

Views: 194

Answers (1)

akrun
akrun

Reputation: 886938

We can use slice_max to return the full row based on the max value of 'year' for each grouping block

library(dplyr)
dat %>%
  group_by(group, month) %>%
  slice_max(year)

Upvotes: 1

Related Questions