Roggan
Roggan

Reputation: 175

Calculate medians of rows in a grouped dataframe

I have a dataframe containing multiple entries per week. It looks like this:

Week t_10 t_15 t_18 t_20 t_25 t_30
1 51.4 37.8 25.6 19.7 11.9  5.6
2 51.9 37.8 25.8 20.4 12.3  6.2
2 52.4 38.5 26.2 20.5 12.3  6.1
3 52.2 38.6 26.1 20.4 12.4  5.9
4 52.2 38.3 26.1 20.2 12.1  5.9
4 52.7 38.4 25.8 20.0 12.1  5.9
4 51.1 37.8 25.7 20.0 12.2  6.0
4 51.9 38.0 26.0 19.8 12.0  5.8

The Weeks have different amounts of entries, they range from one entry for a week to multiple (up to 4) entries a week. I want to calculate the medians of each week and output it for all the different variables (t_10 throughout to t_30) in a new dataframe. NA cells are already omitted in the original dataframe. I have tried different approaches through the ddply function of the plyrpackage but to no avail so far.

Upvotes: 2

Views: 241

Answers (3)

DataTx
DataTx

Reputation: 1869

You can also use the aggregate function:

   newdf <- aggregate(data = df, Week ~ . , median)

Upvotes: 0

akrun
akrun

Reputation: 886938

We could use summarise_at for multiple columns

library(dplyr)
colsToKeep <- c("t_10", "t_30")
df1 %>%
   group_by(Week) %>%
   summarise_at(vars(colsToKeep), median) 
# A tibble: 4 x 3
#   Week  t_10  t_30
#  <int> <dbl> <dbl>
#1     1 51.40  5.60
#2     2 52.15  6.15
#3     3 52.20  5.90
#4     4 52.05  5.90

Upvotes: 2

pogibas
pogibas

Reputation: 28309

Specify variables to keep in colsToKeep and store input table in d

library(tidyverse)
colsToKeep <- c("t_10", "t_30")
gather(d, variable, value, -Week) %>%
    filter(variable %in% colsToKeep) %>%
    group_by(Week, variable) %>%
    summarise(median = median(value))

# A tibble: 8 x 3
# Groups:   Week [4]
   Week variable median
  <int>    <chr>  <dbl>
1     1     t_10  51.40
2     1     t_30   5.60
3     2     t_10  52.15
4     2     t_30   6.15
5     3     t_10  52.20
6     3     t_30   5.90
7     4     t_10  52.05
8     4     t_30   5.90

Upvotes: 1

Related Questions