Salmo salar
Salmo salar

Reputation: 557

ddply multiple function arguments + naming

Browsing other questions I have almost solved my problem but failing at the last hurdle...

using R

I have a dataframe (d) of which I pass through a function (fd) with ddply from the plyr package, this returns a dataframe as expected. In my actual dataframe I have a large number of variables I want to pass into the function, rather than calling it multiple times, I would also like to give relevant col.names to the output datatable. Tried to do step by step of workings below...

Sample data:

d<-structure(list(date.time = structure(c(1367943040, 1367950947, 
1367950965, 1367950987, 1367951028, 1367951045, 1367959536, 1367960275, 
1367960413, 1367985859, 1368005216, 1368005233, 1368011698, 1368011931, 
1368012615, 1368033855), tzone = "", class = c("POSIXct", "POSIXt"
)), station = c("L5", "L5", "L5", "L5", "L5", "L5", "L7", "L7", 
"L7", "L7", "L5", "L5", "L7", "L7", "L7", "L7"), code = c(10891, 
10891, 10891, 10891, 10891, 10891, 10891, 10891, 10891, 10891, 
10888, 10888, 10888, 10888, 10888, 10888)), .Names = c("date.time", 
"station", "code"), row.names = c(2421L, 2466L, 2467L, 2468L, 
2469L, 2470L, 2472L, 2473L, 2474L, 2812L, 2837L, 2838L, 2859L, 
2860L, 2861L, 3219L), class = "data.frame")

I have a function to find the first occurance of an event and return the datetime when this event occoured:

fd<- function(x, var){
  time<- (as.POSIXct(x$date.time [x$station == var] [1]))
  paste (as.POSIXct (time, origin="1970-1-1", tz='UTC'))
}

I pass this to the dataframe:

ddply(d,'code',fd,"L7")

Finds datetime where station "L7" is first recorded and returns dataframe:

code                  V1
1 10888 2013-05-08 12:14:58
2 10891 2013-05-07 21:45:36

Is there a more efficient way of calling multiple function arguments rather than writing multiple function calls. Also to name the column, above "V1" would read "L7" something like this (does not work)...

ddply(d,'code',fd,c("L7", "F5"))   

What i have so far and works to an extent is:

data.frame(  
  ddply(d,'code',fd,"L7"),
  ddply(d,'code',fd,"L5")) 

Returns:

   code          V1          code.1        V1.1
1 10888 2013-05-08 12:14:58  10888 2013-05-08 10:26:56
2 10891 2013-05-07 21:45:36  10891 2013-05-07 17:10:40

As you can see "code" is repeated

And colnames are inappropriate, what I would like in the end is a data.frame with:

  code           M1                 M2
1 10888 2013-05-08 12:14:58  2013-05-08 10:26:56
2 10891 2013-05-07 21:45:36  2013-05-07 17:10:40

Upvotes: 1

Views: 2476

Answers (2)

SchaunW
SchaunW

Reputation: 3601

There's probably an easier way to do this, but you could combine your use of plyr with reshape2:

require(plyr)
require(reshape2)

d2 <- ddply(d, c("code", "station"), function(df) {
  df[which.min(df$date.time),]
})

d3 <- dcast(d2, code ~ station, value.var = "date.time")

d3

   code         L5         L7
1 10888 1368005216 1368011698
2 10891 1367943040 1367959536

dcast converts POSIXct classes to integer, so you'll have to convert them back:

d3[,grepl("^L", colnames(d3))] <- lapply(d3[,grepl("^L", colnames(d3))], as.POSIXct,  
  origin="1970-10-01")

d3
   code                  L5                  L7
1 10888 2004-02-06 04:26:56 2004-02-06 06:14:58
2 10891 2004-02-05 11:10:40 2004-02-05 15:45:36

EDIT

I just thought of an easier way that doesn't require reshape2:

  as.POSIXct(tapply(df$date.time, df$station, min), origin="1970-10-01")
+ })

   code                  L5                  L7
1 10888 2014-02-05 04:26:56 2014-02-05 06:14:58
2 10891 2014-02-04 11:10:40 2014-02-04 15:45:36

All of this assumes that you really want your output to list each station's values in different columns. If you're ok with station identifiers being a separate column by themselves, djhurio's response is simplest.

Upvotes: 2

djhurio
djhurio

Reputation: 5536

ddply(d, c("code", "station"), head, n = 1)

Upvotes: 1

Related Questions