Reputation: 3760
I need to calculate median of 2 minutes at the beginning and end of each group in one of the columns.
Here is the sample data:
Time <- c("2015-08-21T10:00:51", "2015-08-21T10:02:51", "2015-08-21T10:04:51", "2015-08-21T10:06:51",
"2015-08-21T10:08:51", "2015-08-21T10:10:51","2015-08-21T10:12:51", "2015-08-21T10:14:51",
"2015-08-21T10:16:51", "2015-08-21T10:18:51", "2015-08-21T10:20:51", "2015-08-21T10:22:51")
x <- c(38.855, 38.664, 40.386, 40.386, 40.195, 40.386, 40.386, 40.195, 40.386, 38.855, 38.664, 40.386)
y <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b")
data <- data.frame(Time,x,y)
data$Time <- as.POSIXct(data$Time, format = "%Y-%m-%dT%H:%M:%S")
So in this case the median of column x of 2 minutes Time at the beginning ("2015-08-21T10:00:51"
, "2015-08-21T10:02:51"
so for x = 38.855, 38.664 median = 38.7595) and end ( "2015-08-21T10:08:51"
, "2015-08-21T10:10:51"
so for x = 40.195, 40.386 median = 40.2905) for level a, further for level b at the beginning ("2015-08-21T10:10:51"
,"2015-08-21T10:12:51"
so for x = 40.386, 40.195 median = 40.2905) and end ("2015-08-21T10:20:51"
, "2015-08-21T10:22:51"
so for x = 38.664, 40.386 median = 39.525)...
The result of this calculation would be best to get as a new data.frame like:
y median1 median2
a 38.7595 40.2905
b 40.2905 39.525
This time grouping has to be done on base of Time column, not on base of rows counts (as in my original data the rows are different for each of the group)
Thanks for all ideas and help!
Upvotes: 1
Views: 46
Reputation: 54237
One way to do it (if I got you right):
as.data.frame(as.list(
aggregate(x~y, data[order(data$Time), ], function(x)
c(med1=mean(head(x, 2)), med2=mean(tail(x, 2)))
)
))
# y x.med1 x.med2
# 1 a 38.7595 40.2905
# 2 b 40.2905 39.5250
I don't see why the grouping has to be done on data$Time
though. Here, it's on data$y
. Replace data[order(data$Time), ]
by data
if the data set is already ordered by time.
For more than one variable, try
library(dplyr)
data %>%
arrange(Time) %>%
group_by(y) %>%
select(-Time) %>%
filter(row_number() %in% c(1, 2, n()-1, n())) %>%
mutate(f = as.factor(rep(c("head", "tail"), each = 2))) %>%
group_by(f, add = TRUE) %>%
summarise_each(funs(median))
Upvotes: 1