Reputation: 3760
I am trying to calculate median of 2 minutes at the beginning and end of certain groups in one of the columns. To make it more clear I am going to explain on a base of the sample data:
Time <- c("2015-08-21T10:00:51", "2015-08-21T10:02:51", "2015-08-21T10:04:51", "2015-08-21T10:06:51",
"2015-08-21T10:08:51", "2015-08-21T10:10:51","2015-08-21T10:12:51", "2015-08-21T10:14:51",
"2015-08-21T10:16:51", "2015-08-21T10:18:51", "2015-08-21T10:20:51", "2015-08-21T10:22:51")
x <- c(38.855, 38.664, 40.386, 40.386, 40.195, 40.386, 40.386, 40.195, 40.386, 38.855, 38.664, 40.386)
y <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b")
data <- data.frame(Time,x,y)
data$Time <- as.POSIXct(data$Time, format = "%Y-%m-%dT%H:%M:%S")
So in this case the median of column x
of 2 minutes Time at the beginning ("2015-08-21T10:00:51", "2015-08-21T10:02:51"
so for x = 38.855, 38.664 median = 38.7595) and end ( "2015-08-21T10:08:51", "2015-08-21T10:10:51"
so for x = 40.195, 40.386 median = 40.2905) for level a
, further for level b
at the beginning ("2015-08-21T10:10:51","2015-08-21T10:12:51"
so for x = 40.386, 40.195 median = 40.2905) and end ("2015-08-21T10:20:51", "2015-08-21T10:22:51"
so for x = 38.664, 40.386 median = 39.525)...
The result of this calculation would be best to get as a new data.frame
like:
y median1 median2
a 38.7595 40.2905
b 40.2905 39.525
Thanks for any help!
Cheers
Upvotes: 0
Views: 53
Reputation: 10483
Using libraries dplyr
and tidyr
, you can do something like this:
data %>%
group_by(y) %>%
slice(c(1, 2, n(), n() - 1)) %>%
group_by(y) %>%
mutate(firstGroup = ifelse(row_number(y) < 3, 'medianGroup1', 'medianGroup2')) %>%
group_by(y, firstGroup) %>%
summarise(medianValue = median(x)) %>%
spread(firstGroup, medianValue)
Output looks as follows:
Source: local data frame [2 x 3]
y medianGroup1 medianGroup2
(fctr) (dbl) (dbl)
1 a 38.7595 40.2905
2 b 40.2905 39.5250
Note, I am showing each step explicitly in the code, but it could be condensed further.
Upvotes: 1