Mal_a
Mal_a

Reputation: 3760

R: Grouping Time by minutes JUST at the beginning and end of each group

I need to calculate median of 2 minutes at the beginning and end of each group in one of the columns.

Here is the sample data:

Time <- c("2015-08-21T10:00:51", "2015-08-21T10:02:51", "2015-08-21T10:04:51", "2015-08-21T10:06:51", 
          "2015-08-21T10:08:51", "2015-08-21T10:10:51","2015-08-21T10:12:51", "2015-08-21T10:14:51", 
          "2015-08-21T10:16:51", "2015-08-21T10:18:51", "2015-08-21T10:20:51", "2015-08-21T10:22:51")
x <-  c(38.855, 38.664, 40.386, 40.386, 40.195, 40.386, 40.386, 40.195, 40.386, 38.855, 38.664, 40.386)
y <-  c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b")
data <- data.frame(Time,x,y)
data$Time <- as.POSIXct(data$Time, format = "%Y-%m-%dT%H:%M:%S")

So in this case the median of column x of 2 minutes Time at the beginning ("2015-08-21T10:00:51", "2015-08-21T10:02:51" so for x = 38.855, 38.664 median = 38.7595) and end ( "2015-08-21T10:08:51", "2015-08-21T10:10:51" so for x = 40.195, 40.386 median = 40.2905) for level a, further for level b at the beginning ("2015-08-21T10:10:51","2015-08-21T10:12:51" so for x = 40.386, 40.195 median = 40.2905) and end ("2015-08-21T10:20:51", "2015-08-21T10:22:51" so for x = 38.664, 40.386 median = 39.525)...

The result of this calculation would be best to get as a new data.frame like:

y median1 median2
a 38.7595 40.2905
b 40.2905 39.525

This time grouping has to be done on base of Time column, not on base of rows counts (as in my original data the rows are different for each of the group)

Thanks for all ideas and help!

Upvotes: 1

Views: 46

Answers (1)

lukeA
lukeA

Reputation: 54237

One way to do it (if I got you right):

as.data.frame(as.list( 
  aggregate(x~y, data[order(data$Time), ], function(x) 
    c(med1=mean(head(x, 2)), med2=mean(tail(x, 2)))
  ) 
))
#   y  x.med1  x.med2
# 1 a 38.7595 40.2905
# 2 b 40.2905 39.5250

I don't see why the grouping has to be done on data$Time though. Here, it's on data$y. Replace data[order(data$Time), ] by data if the data set is already ordered by time.


For more than one variable, try

library(dplyr)
data %>% 
  arrange(Time) %>% 
  group_by(y) %>% 
  select(-Time) %>% 
  filter(row_number() %in% c(1, 2, n()-1, n())) %>%
  mutate(f = as.factor(rep(c("head", "tail"), each = 2))) %>%
  group_by(f, add = TRUE) %>%
  summarise_each(funs(median)) 

Upvotes: 1

Related Questions