tcs
tcs

Reputation: 377

How to aggregate results by date (month, year, day) and subgroup using data.table

Using R version 3.1.3 I'm attempting to count of events in event log data.

I have a data set of timstamped events. I've cleaned the data, and have it loaded into a data.table for easier manipulation.

Colnames are OrderDate, EventDate, OrderID, EventTypeID, LocationID and EncounterID,

These events are aggregated as: EncounterID's have multiple orderID, each orderID has multiple eventID

Examples of data would be:

library(data.table) 
DT <- fread("OrderDate,EventDate,OrderID,EventTypeID,LocationID,EncounterID 
1/12/2012 5:40,01/12/2012 05:40,100001,12344,1,5998887
1/12/2012 5:40,01/12/2012 05:49,100001,12345,1,5998887
1/12/2012 5:40,01/12/2012 06:40,100001,12345,1,5998887
1/12/2012 5:45,01/12/2012 05:45,100002,12344,1,5998887
1/12/2012 5:45,01/12/2012 05:49,100002,12345,1,5998887
1/12/2012 5:45,01/12/2012 06:40,100002,12345,1,5998887
1/12/2012 5:46,01/12/2012 05:46,100003,12344,2,5948887
1/12/2012 5:46,01/12/2012 05:49,100003,12345,2,5948887
1/12/2013 7:40,01/12/2013 07:40,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:41,123001,12346,2,6008887
1/12/2013 7:40,01/12/2013 07:50,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:55,123001,12345,2,6008887")


DT$OrderDate <- as.POSIXct(DT$OrderDate, format="%d/%m/%Y %H:%M")
DT$EventDate <- as.POSIXct(DT$EventDate, format="%d/%m/%Y %H:%M")

My ultimate goal is to explore this data visually using ggplot2, looking at the count of various combinations by month... but I'm having trouble aggregating the data using data.table's

My specific question (one example) How can I generate a table of of the following: Month-Year, LocationID, Count_of_Orders

If I do the following:

DT[,.N,by=.(month(OrderDate),year(OrderDate))]

I get a count of all the eventID's, but I need the Count of OrderID's per month per locationID.

   month year N
1:    12 2012 8
2:    12 2013 4

BUT - what I'm looking for is results of N by Month-year by LocationID:

Month-Year,LocationID,Count_of_orders
01-12,1,2
01-12,2,1
01-13,1,0
01-13,2,1

NOTE: Notice, that for any location that doesn't have orders in a month, they should be listed with count zero. The locations would therefore need to be determined by generating a list of unique locationIDs.

Can someone please provide solutions?

Thanks

Upvotes: 3

Views: 414

Answers (1)

eddi
eddi

Reputation: 49448

I'm assuming your date/times are in POSIXct format (since you call month/year). Then,

d[, month.year := format(OrderDate, '%m-%y')]

setkey(d, month.year, LocationID, OrderID)

unique(d)[CJ(unique(month.year), unique(LocationID)), .N, by = .EACHI]
#   month.year LocationID N
#1:      01-12          1 2
#2:      01-12          2 1
#3:      01-13          1 0
#4:      01-13          2 1

I used the fact that unique by default will pick unique entries by the key, and would also preserve the key, so I can do the next join easily.

Upvotes: 2

Related Questions