Reputation: 2513
library(data.table)
df <- structure(
list(
type = c("AAA", "AAA", "AAA", "BCD", "BCD", "BCD", "EEE", "EEE", "EEE", "EEE"),
date = c("2015-01-01", "2015-01-01", "2015-01-01", "2015-01-02", "2015-01-05", "2015-01-05", "2015-01-04", "2015-01-04", "2015-01-04", "2015-01-04")
),
.Names = c("type", "date"),
class = "data.frame",
row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L))
df$date <- as.Date(df$date)
df
sets up the following example data frame uniquely named 'df'
type date
0 AAA 2015-01-01
1 AAA 2015-01-01
2 AAA 2015-01-01
3 BCD 2015-01-02
4 BCD 2015-01-05
5 BCD 2015-01-05
6 EEE 2015-01-04
7 EEE 2015-01-04
8 EEE 2015-01-04
9 EEE 2015-01-04
I am asking for help on how base R, data.table, or even dplyr lovers create a new column which lists the number of times a 'type' is recorded for a given 'date'.
type date typeDateGroup
0 AAA 2015-01-01 3
1 AAA 2015-01-01 3
2 AAA 2015-01-01 3
3 BCD 2015-01-02 1
4 BCD 2015-01-05 2
5 BCD 2015-01-05 2
6 EEE 2015-01-04 4
7 EEE 2015-01-04 4
8 EEE 2015-01-04 4
9 EEE 2015-01-04 4
If it helps knowing, in contrast to this example, usually my data includes 3-5mm rows.
don't run this, it was my attempt, and it fails...
library(data.table)
df <- as.data.table(df)
df<-df[order(type, date), `:=`(typeDateGroup = .N), by=type, date]
Thank you for looking at this and dominating with your skills.
Upvotes: 1
Views: 2705
Reputation: 92300
For future knowledge, in your data.table
version, if you want to override df
just do assigment by reference, i.e., setDT(df)
instead of df <- as.data.table(df)
.
Also, when using assignment by reference (:=
) within the data.table
object, there is no need in df<-
.
Moreover, you can also sort your data.table
using data.table
s setorder
function (though don't have to, not in this specific case, neither in general).
Lastly, when passing two variables into the by
argument, you should use either list(type, date)
or .(type, date)
or c("type", "date")
or "type,date"
So for completeness, here's the dplyr
version
library(dplyr)
df %>%
group_by(type, date) %>%
mutate(typeDateGroup = n())
Upvotes: 5
Reputation: 162451
A couple of options:
## Using base R only:
df <- transform(df, typeDateGroup=ave(as.numeric(date), type, date, FUN=length))
## With data.table:
library(data.table)
dt <- data.table(df)
dt[, typeDateGroup:=.N, by=c("type","date")]
Upvotes: 4