Reputation: 557
Browsing other questions I have almost solved my problem but failing at the last hurdle...
using R
I have a dataframe (d) of which I pass through a function (fd) with ddply from the plyr package, this returns a dataframe as expected. In my actual dataframe I have a large number of variables I want to pass into the function, rather than calling it multiple times, I would also like to give relevant col.names to the output datatable. Tried to do step by step of workings below...
Sample data:
d<-structure(list(date.time = structure(c(1367943040, 1367950947,
1367950965, 1367950987, 1367951028, 1367951045, 1367959536, 1367960275,
1367960413, 1367985859, 1368005216, 1368005233, 1368011698, 1368011931,
1368012615, 1368033855), tzone = "", class = c("POSIXct", "POSIXt"
)), station = c("L5", "L5", "L5", "L5", "L5", "L5", "L7", "L7",
"L7", "L7", "L5", "L5", "L7", "L7", "L7", "L7"), code = c(10891,
10891, 10891, 10891, 10891, 10891, 10891, 10891, 10891, 10891,
10888, 10888, 10888, 10888, 10888, 10888)), .Names = c("date.time",
"station", "code"), row.names = c(2421L, 2466L, 2467L, 2468L,
2469L, 2470L, 2472L, 2473L, 2474L, 2812L, 2837L, 2838L, 2859L,
2860L, 2861L, 3219L), class = "data.frame")
I have a function to find the first occurance of an event and return the datetime when this event occoured:
fd<- function(x, var){
time<- (as.POSIXct(x$date.time [x$station == var] [1]))
paste (as.POSIXct (time, origin="1970-1-1", tz='UTC'))
}
I pass this to the dataframe:
ddply(d,'code',fd,"L7")
Finds datetime where station "L7" is first recorded and returns dataframe:
code V1
1 10888 2013-05-08 12:14:58
2 10891 2013-05-07 21:45:36
Is there a more efficient way of calling multiple function arguments rather than writing multiple function calls. Also to name the column, above "V1" would read "L7" something like this (does not work)...
ddply(d,'code',fd,c("L7", "F5"))
What i have so far and works to an extent is:
data.frame(
ddply(d,'code',fd,"L7"),
ddply(d,'code',fd,"L5"))
Returns:
code V1 code.1 V1.1
1 10888 2013-05-08 12:14:58 10888 2013-05-08 10:26:56
2 10891 2013-05-07 21:45:36 10891 2013-05-07 17:10:40
As you can see "code" is repeated
And colnames are inappropriate, what I would like in the end is a data.frame with:
code M1 M2
1 10888 2013-05-08 12:14:58 2013-05-08 10:26:56
2 10891 2013-05-07 21:45:36 2013-05-07 17:10:40
Upvotes: 1
Views: 2476
Reputation: 3601
There's probably an easier way to do this, but you could combine your use of plyr
with reshape2
:
require(plyr)
require(reshape2)
d2 <- ddply(d, c("code", "station"), function(df) {
df[which.min(df$date.time),]
})
d3 <- dcast(d2, code ~ station, value.var = "date.time")
d3
code L5 L7
1 10888 1368005216 1368011698
2 10891 1367943040 1367959536
dcast
converts POSIXct classes to integer, so you'll have to convert them back:
d3[,grepl("^L", colnames(d3))] <- lapply(d3[,grepl("^L", colnames(d3))], as.POSIXct,
origin="1970-10-01")
d3
code L5 L7
1 10888 2004-02-06 04:26:56 2004-02-06 06:14:58
2 10891 2004-02-05 11:10:40 2004-02-05 15:45:36
EDIT
I just thought of an easier way that doesn't require reshape2
:
as.POSIXct(tapply(df$date.time, df$station, min), origin="1970-10-01")
+ })
code L5 L7
1 10888 2014-02-05 04:26:56 2014-02-05 06:14:58
2 10891 2014-02-04 11:10:40 2014-02-04 15:45:36
All of this assumes that you really want your output to list each station's values in different columns. If you're ok with station identifiers being a separate column by themselves, djhurio's response is simplest.
Upvotes: 2