Reputation: 161
I want to check a data.table for all dates that have a given category, and set all those dates to TRUE
in a separate data.table. I have a data.table that looks something like this:
library(data.table)
DT1 = fread(
'Date Category
2010-01-01 A
2010-01-01 B
2010-01-02 A
2010-01-02 C
2010-01-02 D
2010-01-04 B
2010-01-04 "B OR D"')
DT1[, Date := as.IDate(Date) ]
and I'm trying to extract the values to a data frame that looks like this:
DT2 = fread('
Date A B C D
2010-01-01 FALSE FALSE FALSE FALSE
2010-01-02 FALSE FALSE FALSE FALSE
2010-01-03 FALSE FALSE FALSE FALSE
2010-01-04 FALSE FALSE FALSE FALSE
2010-01-05 FALSE FALSE FALSE FALSE')
DT2[, Date := as.IDate(Date) ]
If a particular category exists for a given date, I would like to set the cell with the row's date and the category's column to true. I know this will probably involve something like:
DT2 <- DT2[Date %in% DT1$Date, A := grep(something)]
I want this call to modify DT2 to look like this:
Date A B C D
2010-01-01 TRUE FALSE FALSE FALSE
2010-01-02 TRUE FALSE FALSE FALSE
2010-01-03 FALSE FALSE FALSE FALSE
2010-01-04 FALSE FALSE FALSE FALSE
2010-01-05 FALSE FALSE FALSE FALSE
Right now, my code uses any(grep())
, which is a problem because it changes DT2$A[4]
to TRUE
, which I don't want. I would like to use a regular expression, since some of my category cells refer to multiple values. Because of the last cell, both my checks on the value B and the value D should set their dummy variables in 2010-01-04
to true. As the last 2 rows show, there may be one date which contains multiple references to a single category.
Is there a way to do this in data.table? I can use other packages if absolutely necessary.
Note that I can't just use dcast
to transform D1 into the proper data.table, because it doesn't contain all the dates I need.
Upvotes: 2
Views: 538
Reputation: 93813
Here's one attempt created by looping over the possible categories (hat-tip to @Frank for adjustments):
DF1[
.(seq(min(Date), max(Date) + 1L, by="day")),
on=.(Date),
lapply(sapply(c("A","B","C","D"), grepl, x=Category, USE.NAMES=TRUE, simplify=FALSE),any),
by=.EACHI
]
# Date A B C D
#1: 2010-01-01 TRUE TRUE FALSE FALSE
#2: 2010-01-02 TRUE FALSE TRUE TRUE
#3: 2010-01-03 FALSE FALSE FALSE FALSE
#4: 2010-01-04 FALSE TRUE FALSE TRUE
#5: 2010-01-05 FALSE FALSE FALSE FALSE
Upvotes: 2