Reputation: 581
I'm getting some strange output when using ddply to apply a function to two different variables. It's actually completing the task correctly, but then assumes the format of the output based on which ever variable is named first in my c(var1, var2)
All I'm trying to achieve is group my dataframe by Conversion.ID
and find the max date and if a click happened, which I thought would be simple.
> class(wrk.ds$intr.date.time)
[1] "POSIXct" "POSIXt"
> class(wrk.ds$type.bin)
[1] "numeric"
> wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(wrk.ds){
+ click.check = as.numeric(max(wrk.ds$type.bin))
+ max.intr.date.time = max(wrk.ds$intr.date.time)
+ c(click.check, max.intr.date.time )})
> head(wrk.ds.1)
Conversion.ID V1 V2
1 8.930874e+15 1 1406473200
2 4.266128e+16 0 1407955140
3 1.241770e+17 0 1409494260
4 1.309763e+17 1 1407238560
5 1.367159e+17 1 1408196760
6 1.417151e+17 0 1409251260
>
> #Reversing the c() order
> wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(wrk.ds){
+ click.check = as.numeric(max(wrk.ds$type.bin))
+ max.intr.date.time = max(wrk.ds$intr.date.time)
+ c(max.intr.date.time, click.check)})
> head(wrk.ds.1)
Conversion.ID V1 V2
1 8.930874e+15 2014-07-27 16:00:00 1970-01-01 01:00:01
2 4.266128e+16 2014-08-13 19:39:00 1970-01-01 01:00:00
3 1.241770e+17 2014-08-31 15:11:00 1970-01-01 01:00:00
4 1.309763e+17 2014-08-05 12:36:00 1970-01-01 01:00:01
5 1.367159e+17 2014-08-16 14:46:00 1970-01-01 01:00:01
6 1.417151e+17 2014-08-28 19:41:00 1970-01-01 01:00:00
My work-around has been to do these in two steps, but I'm really more curious to know if this can be fixed.
I've tried the following, but to no avail.
wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(wrk.ds){
click.check = as.numeric(max(wrk.ds$type.bin))
max.intr.date.time = max(wrk.ds$intr.date.time)
c(click.check, as.POSIXct(max.intr.date.time ))})
As a bonus question, can anyone tell me way my labels for my newly created variables aren't getting assigned that
Upvotes: 0
Views: 97
Reputation: 132706
The anonymous function you pass to ddply
should return a data.frame, yours is returning a vector. Change it like this:
wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(DF){
click.check = max(DF$type.bin)
max.intr.date.time = max(DF$intr.date.time)
data.frame(click.check, max.intr.date.time )})
Of course, you should use summarise
instead:
wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), summarise,
click.check = max(type.bin),
max.intr.date.time = max(intr.date.time))
Upvotes: 1