Reputation: 507
I am attempting to write a ddply summarize statement that works on a vector of POSIXct times. for each user.nm I simply want to get the largest and smallest timestamp associated with their name. the data looks like so:
test.data=structure(list(user.nm = structure(c(1L, 1L, 2L, 3L, 4L, 4L), .Label = c("a",
"b", "c", "d"), class = "factor"), ip.addr.txt = structure(c(1L,
2L, 3L, 4L, 5L, 5L), .Label = c("a", "b", "c", "d", "e"), class = "factor"),
login.dt = structure(c(4L, 3L, 5L, 1L, 2L, 6L), .Label = c("11/20/2013",
"12/26/2013", "3/11/2013", "6/25/2013", "6/27/2013", "7/15/2013"
), class = "factor"), login.time = structure(c(3L, 4L, 6L,
1L, 2L, 5L), .Label = c("10:16:17", "11:07:27", "13:22:32",
"13:55:05", "9:23:33", "9:49:23"), class = "factor"), login.sessn.ts = structure(c(1372180920,
1363024500, 1372340940, 1384960560, 1388074020, 1373894580
), class = c("POSIXct", "POSIXt"), tzone = ""), month = structure(c(3L,
4L, 3L, 5L, 1L, 2L), .Label = c("Dec-2013", "Jul-2013", "Jun-2013",
"Mar-2013", "Nov-2013"), class = "factor"), quarter = c(2L,
1L, 2L, 4L, 4L, 3L), change.label = c(TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE)), .Names = c("user.nm", "ip.addr.txt", "login.dt",
"login.time", "login.sessn.ts", "month", "quarter", "change.label"
), row.names = c(NA, -6L), class = "data.frame")
the plyr statement looks like this:
user.changes=ddply(test.data, c("user.nm"), summarize,
change.count=sum(ip.label.txt),
max.change.time=max(login.sessn.ts),
min.change.time=min(login.sessn.ts))
and the error I'm getting is this:
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [2]
I'm having some issues interpreting what this error actually means, and apparently one person's solution involved converting the POSIXct class to character, which doesn't really work in my case.
could anyone shed some light on how to make this work? I'm open to other approaches as well, I just like the relative simplicity of ddply's syntax. I'll be working with more time-based data in the near future, so I would appreciate anyone's insight on how to approach this type of aggregation problem with other R-based tools.
Upvotes: 0
Views: 161
Reputation: 2950
I checked your data with str
, and it turned out that your dates were actually factors. You can make them dates with lubridate
:
library(lubridate)
test.data2 <- transform(test.data,lst = dmy_hm(login.sessn.ts))
ddply(test.data2, c("user.nm"), summarize,
change.count=sum(ip.addr.txt),
max.change.time=max(lst),
min.change.time=min(lst))
user.nm change.count max.change.time min.change.time
1 a 3 2013-11-03 13:55:00 2013-01-06 12:03:44
2 b 3 2013-01-06 08:35:32 2013-01-06 08:35:32
3 c 4 2013-01-11 10:16:00 2013-01-11 10:16:00
4 d 10 2046-11-24 13:24:29 2013-01-12 11:08:04
Upvotes: 0