Reputation: 175
I have a dataframe containing multiple entries per week. It looks like this:
Week t_10 t_15 t_18 t_20 t_25 t_30
1 51.4 37.8 25.6 19.7 11.9 5.6
2 51.9 37.8 25.8 20.4 12.3 6.2
2 52.4 38.5 26.2 20.5 12.3 6.1
3 52.2 38.6 26.1 20.4 12.4 5.9
4 52.2 38.3 26.1 20.2 12.1 5.9
4 52.7 38.4 25.8 20.0 12.1 5.9
4 51.1 37.8 25.7 20.0 12.2 6.0
4 51.9 38.0 26.0 19.8 12.0 5.8
The Weeks have different amounts of entries, they range from one entry for a week to multiple (up to 4) entries a week.
I want to calculate the medians of each week and output it for all the different variables (t_10 throughout to t_30) in a new dataframe. NA cells are already omitted in the original dataframe. I have tried different approaches through the ddply
function of the plyr
package but to no avail so far.
Upvotes: 2
Views: 241
Reputation: 1869
You can also use the aggregate function:
newdf <- aggregate(data = df, Week ~ . , median)
Upvotes: 0
Reputation: 886938
We could use summarise_at
for multiple columns
library(dplyr)
colsToKeep <- c("t_10", "t_30")
df1 %>%
group_by(Week) %>%
summarise_at(vars(colsToKeep), median)
# A tibble: 4 x 3
# Week t_10 t_30
# <int> <dbl> <dbl>
#1 1 51.40 5.60
#2 2 52.15 6.15
#3 3 52.20 5.90
#4 4 52.05 5.90
Upvotes: 2
Reputation: 28309
Specify variables to keep in colsToKeep
and store input table in d
library(tidyverse)
colsToKeep <- c("t_10", "t_30")
gather(d, variable, value, -Week) %>%
filter(variable %in% colsToKeep) %>%
group_by(Week, variable) %>%
summarise(median = median(value))
# A tibble: 8 x 3
# Groups: Week [4]
Week variable median
<int> <chr> <dbl>
1 1 t_10 51.40
2 1 t_30 5.60
3 2 t_10 52.15
4 2 t_30 6.15
5 3 t_10 52.20
6 3 t_30 5.90
7 4 t_10 52.05
8 4 t_30 5.90
Upvotes: 1