Mr. Buster
Mr. Buster

Reputation: 131

R: Count observations that satisfy multiple criteria by row in DataFrame

I have a Summary_Table that has Store, Product, and Date. And I need to create a 4th column that counts the number of observations in the Records_table that meets multiple criteria per row of the Summary_Table.

This is my summary table.

> Summary_Table
Store   Product Date
Chain1  ABC     1/1/2007
Chain1  ABC     1/1/2008
Chain1  ABC     1/1/2009
Chain1  ABC     1/1/2010
Chain1  DEF     1/1/2007
Chain1  DEF     1/1/2008
Chain1  DEF     1/1/2009
Chain1  DEF     1/1/2010
Chain2  ABC     1/1/2007
Chain2  ABC     1/1/2008
Chain2  ABC     1/1/2009
Chain2  ABC     1/1/2010
Chain2  DEF     1/1/2007
Chain2  DEF     1/1/2008
Chain2  DEF     1/1/2009
Chain2  DEF     1/1/2010

And this is my Records Table

> Records_Table
Store   Product Eval_Date
Chain1  DEF     2/13/2008
Chain2  ABC     1/15/2009
Chain1  DEF     5/13/2009
Chain2  ABC     3/15/2007
Chain1  DEF     5/14/2010
Chain2  DEF     10/10/2010
Chain1  ABC     11/1/2007
Chain2  ABC     10/1/2008
Chain2  DEF     6/1/2009
Chain2  DEF     7/1/2010
Chain2  ABC     8/1/2009
Chain1  ABC     2/1/2009

So I need to create a 4th column in Summary table that counts the number of observations in Records table. That meets the following criteria per row of the summary table.

Store should be equal, Product should be equal, And for the dates; if the (Records_Table$Eval_Date) is greater than (Summary_Table$Date).

So my desired output in the Summary_Table is

> Summary_Table
Store   Product Date      Count
Chain1  ABC     1/1/2007    2
Chain1  ABC     1/1/2008    1
Chain1  ABC     1/1/2009    1
Chain1  ABC     1/1/2010    0
Chain1  DEF     1/1/2007    3
Chain1  DEF     1/1/2008    3
Chain1  DEF     1/1/2009    2
Chain1  DEF     1/1/2010    1
Chain2  ABC     1/1/2007    4
Chain2  ABC     1/1/2008    3
Chain2  ABC     1/1/2009    2
Chain2  ABC     1/1/2010    0
Chain2  DEF     1/1/2007    3
Chain2  DEF     1/1/2008    3
Chain2  DEF     1/1/2009    3
Chain2  DEF     1/1/2010    2

Upvotes: 1

Views: 305

Answers (2)

Rushabh Patel
Rushabh Patel

Reputation: 2764

You can do left join using data.table package-

library(data.table)
setDT(recdt)[,Eval_Date:=as.Date(Eval_Date,format="%m/%d/%Y")]
setDT(sumdt)[,Date:=as.Date(Date,format="%m/%d/%Y")]
recdt[sumdt,.N, on=c("Store","Product","Eval_Date>Date"), by=.EACHI]

Output-

    Store  Product Eval_Date  N
 1: Chain1     ABC 2007-01-01 2
 2: Chain1     ABC 2008-01-01 1
 3: Chain1     ABC 2009-01-01 1
 4: Chain1     ABC 2010-01-01 0
 5: Chain1     DEF 2007-01-01 3
 6: Chain1     DEF 2008-01-01 3
 7: Chain1     DEF 2009-01-01 2
 8: Chain1     DEF 2010-01-01 1
 9: Chain2     ABC 2007-01-01 4
10: Chain2     ABC 2008-01-01 3
11: Chain2     ABC 2009-01-01 2
12: Chain2     ABC 2010-01-01 0
13: Chain2     DEF 2007-01-01 3
14: Chain2     DEF 2008-01-01 3
15: Chain2     DEF 2009-01-01 3
16: Chain2     DEF 2010-01-01 2

Upvotes: 1

olorcain
olorcain

Reputation: 1248

I'm assuming that the date column in your summary table is a grouping by year. If so, using lubridate and dplyr:

library(dplyr)
library(lubridate)

Records <- Records %>% 
  mutate(Eval_Date = mdy(Eval_Date), Year = year(Eval_Date)) 

summary <- Records %>%
  group_by(Store, Product, Year) %>%
  summarise(Count = n()) %>% 

Upvotes: 0

Related Questions