Reputation: 377
Using R version 3.1.3 I'm attempting to count of events in event log data.
I have a data set of timstamped events. I've cleaned the data, and have it loaded into a data.table for easier manipulation.
Colnames are OrderDate, EventDate, OrderID, EventTypeID, LocationID and EncounterID,
These events are aggregated as: EncounterID's have multiple orderID, each orderID has multiple eventID
Examples of data would be:
library(data.table)
DT <- fread("OrderDate,EventDate,OrderID,EventTypeID,LocationID,EncounterID
1/12/2012 5:40,01/12/2012 05:40,100001,12344,1,5998887
1/12/2012 5:40,01/12/2012 05:49,100001,12345,1,5998887
1/12/2012 5:40,01/12/2012 06:40,100001,12345,1,5998887
1/12/2012 5:45,01/12/2012 05:45,100002,12344,1,5998887
1/12/2012 5:45,01/12/2012 05:49,100002,12345,1,5998887
1/12/2012 5:45,01/12/2012 06:40,100002,12345,1,5998887
1/12/2012 5:46,01/12/2012 05:46,100003,12344,2,5948887
1/12/2012 5:46,01/12/2012 05:49,100003,12345,2,5948887
1/12/2013 7:40,01/12/2013 07:40,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:41,123001,12346,2,6008887
1/12/2013 7:40,01/12/2013 07:50,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:55,123001,12345,2,6008887")
DT$OrderDate <- as.POSIXct(DT$OrderDate, format="%d/%m/%Y %H:%M")
DT$EventDate <- as.POSIXct(DT$EventDate, format="%d/%m/%Y %H:%M")
My ultimate goal is to explore this data visually using ggplot2, looking at the count of various combinations by month... but I'm having trouble aggregating the data using data.table's
My specific question (one example) How can I generate a table of of the following: Month-Year, LocationID, Count_of_Orders
If I do the following:
DT[,.N,by=.(month(OrderDate),year(OrderDate))]
I get a count of all the eventID's, but I need the Count of OrderID's per month per locationID.
month year N
1: 12 2012 8
2: 12 2013 4
BUT - what I'm looking for is results of N by Month-year by LocationID:
Month-Year,LocationID,Count_of_orders
01-12,1,2
01-12,2,1
01-13,1,0
01-13,2,1
NOTE: Notice, that for any location that doesn't have orders in a month, they should be listed with count zero. The locations would therefore need to be determined by generating a list of unique locationIDs.
Can someone please provide solutions?
Thanks
Upvotes: 3
Views: 414
Reputation: 49448
I'm assuming your date/times are in POSIXct
format (since you call month
/year
). Then,
d[, month.year := format(OrderDate, '%m-%y')]
setkey(d, month.year, LocationID, OrderID)
unique(d)[CJ(unique(month.year), unique(LocationID)), .N, by = .EACHI]
# month.year LocationID N
#1: 01-12 1 2
#2: 01-12 2 1
#3: 01-13 1 0
#4: 01-13 2 1
I used the fact that unique
by default will pick unique entries by the key, and would also preserve the key, so I can do the next join easily.
Upvotes: 2