Jake Fisher
Jake Fisher

Reputation: 3310

How do I reference the number of rows in a group in dplyr?

I'm trying to write a function to use with dplyr that uses the number of rows in the group. Is there any way to reference the number of rows in the group in dplyr, other than just creating a new column? This would be equivalent to the .N variable in data.table.

Here's an example of what I'm trying to do:

library(dplyr)
library(RcppRoll)

# Function I'm trying to create
rollingMean <- function(x, n = 4) 
  if (.N < n) {  # I want to test whether we have more than 4 rows
    out <- mean(x)  # if so, return the overall mean
  } else {
    out <- roll_meanr(x, n)
  }
  return(out)
  }

# Fake data
tmp <- data.frame(X = 1:21, grouping = c(rep(letters[1:2], 10), letters[3]))

tmp %>%
  group_by(grouping) %>%
  mutate(ma = rollingMean(X)) %>%
  tail  # Of course, this doesn't work, but the value for ma for the last row should be 21

This seems like it would be fairly simple to do. Does anyone know how to do it?

Upvotes: 0

Views: 700

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226871

I think the test in rollingMean just needs to be

if (length(x) < n)

There is an ?n function in dplyr, but it's special --

... can only be used from within ‘summarise’, ‘mutate’ and ‘filter’ ...

Upvotes: 1

Related Questions