lamecicle
lamecicle

Reputation: 581

Strange output with ddply when using different variable classes

I'm getting some strange output when using ddply to apply a function to two different variables. It's actually completing the task correctly, but then assumes the format of the output based on which ever variable is named first in my c(var1, var2)

All I'm trying to achieve is group my dataframe by Conversion.ID and find the max date and if a click happened, which I thought would be simple.

> class(wrk.ds$intr.date.time)
[1] "POSIXct" "POSIXt" 
> class(wrk.ds$type.bin)
[1] "numeric"

> wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(wrk.ds){
+                 click.check = as.numeric(max(wrk.ds$type.bin))
+                 max.intr.date.time = max(wrk.ds$intr.date.time)
+                 c(click.check, max.intr.date.time )})
> head(wrk.ds.1)
  Conversion.ID V1         V2
1  8.930874e+15  1 1406473200
2  4.266128e+16  0 1407955140
3  1.241770e+17  0 1409494260
4  1.309763e+17  1 1407238560
5  1.367159e+17  1 1408196760
6  1.417151e+17  0 1409251260
> 
> #Reversing the c() order
> wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(wrk.ds){
+                 click.check = as.numeric(max(wrk.ds$type.bin))
+                 max.intr.date.time = max(wrk.ds$intr.date.time)
+                 c(max.intr.date.time, click.check)})
> head(wrk.ds.1)
  Conversion.ID                  V1                  V2
1  8.930874e+15 2014-07-27 16:00:00 1970-01-01 01:00:01
2  4.266128e+16 2014-08-13 19:39:00 1970-01-01 01:00:00
3  1.241770e+17 2014-08-31 15:11:00 1970-01-01 01:00:00
4  1.309763e+17 2014-08-05 12:36:00 1970-01-01 01:00:01
5  1.367159e+17 2014-08-16 14:46:00 1970-01-01 01:00:01
6  1.417151e+17 2014-08-28 19:41:00 1970-01-01 01:00:00

My work-around has been to do these in two steps, but I'm really more curious to know if this can be fixed.

I've tried the following, but to no avail.

wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(wrk.ds){
                click.check = as.numeric(max(wrk.ds$type.bin))
                max.intr.date.time = max(wrk.ds$intr.date.time)
                c(click.check, as.POSIXct(max.intr.date.time ))})

As a bonus question, can anyone tell me way my labels for my newly created variables aren't getting assigned that

Upvotes: 0

Views: 97

Answers (1)

Roland
Roland

Reputation: 132706

The anonymous function you pass to ddply should return a data.frame, yours is returning a vector. Change it like this:

wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), function(DF){
                 click.check = max(DF$type.bin)
                 max.intr.date.time = max(DF$intr.date.time)
                 data.frame(click.check, max.intr.date.time )})

Of course, you should use summarise instead:

wrk.ds.1 <- ddply(wrk.ds, .(Conversion.ID), summarise,
                   click.check = max(type.bin),
                   max.intr.date.time = max(intr.date.time))  

Upvotes: 1

Related Questions