R new column summarizing count of groups of columns

Question

library(data.table)
df <- structure(
  list(
    type = c("AAA", "AAA", "AAA", "BCD", "BCD", "BCD", "EEE", "EEE", "EEE", "EEE"), 
    date = c("2015-01-01", "2015-01-01", "2015-01-01", "2015-01-02", "2015-01-05", "2015-01-05", "2015-01-04", "2015-01-04", "2015-01-04", "2015-01-04")
    ), 
  .Names = c("type", "date"), 
  class = "data.frame", 
  row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L))
df$date <- as.Date(df$date)
df

sets up the following example data frame uniquely named 'df'

  type       date
0  AAA 2015-01-01
1  AAA 2015-01-01
2  AAA 2015-01-01
3  BCD 2015-01-02
4  BCD 2015-01-05
5  BCD 2015-01-05
6  EEE 2015-01-04
7  EEE 2015-01-04
8  EEE 2015-01-04
9  EEE 2015-01-04

I am asking for help on how base R, data.table, or even dplyr lovers create a new column which lists the number of times a 'type' is recorded for a given 'date'.

  type       date typeDateGroup
0  AAA 2015-01-01             3 
1  AAA 2015-01-01             3
2  AAA 2015-01-01             3
3  BCD 2015-01-02             1
4  BCD 2015-01-05             2
5  BCD 2015-01-05             2
6  EEE 2015-01-04             4
7  EEE 2015-01-04             4
8  EEE 2015-01-04             4
9  EEE 2015-01-04             4

If it helps knowing, in contrast to this example, usually my data includes 3-5mm rows.

don't run this, it was my attempt, and it fails...

library(data.table)
df <- as.data.table(df)
df<-df[order(type, date), `:=`(typeDateGroup = .N), by=type, date]

Thank you for looking at this and dominating with your skills.

Josh O&#39;Brien · Accepted Answer

A couple of options:

## Using base R only:
df <- transform(df, typeDateGroup=ave(as.numeric(date), type, date, FUN=length))

## With data.table:
library(data.table)
dt <- data.table(df)
dt[, typeDateGroup:=.N, by=c("type","date")]

R new column summarizing count of groups of columns

Answers (2)

Related Questions