David Edwards
David Edwards

Reputation: 33

How can I calculate the median, by factor, for multiple columns?

I must compute the median value of each column in a data set by its factor.

This is the code I have used to get the median of each column excluding the 'type' column. The type column is the first column in the data frame and it is the values of that column that are my factors.

quant0 = c(0.5)
Median = apply(mydata[2:1051], 2, median, probs = quant0, na.rm = TRUE )

My data frame looks something like this:

        Type    x1  x2  x3  ...
1:  Fresh   1.54    1.48    1.88    
2:  Dated   1.46    1.99    1.48
3:  Fresh   2.01    1.02    1.03
...

I want the median values of x1, x2, ... for factors Fresh and Dated.

Upvotes: 1

Views: 1104

Answers (2)

akrun
akrun

Reputation: 886938

We can use group_by with across from dplyr. Grouped by 'Type', loop across the columns that starts with 'x', get the median

library(dplyr)
mydata %>%
       group_by(Type) %>%
       summarise(across(starts_with('x'), median, na.rm = TRUE))

Or with quantile

mydata %>%
    group_by(Type) %>%
    summarise(across(starts_with('x'), quantile, probs = quant0, na.rm = TRUE))

Upvotes: 2

Onyambu
Onyambu

Reputation: 79188

In Base R You could use aggregate: Note that median does not take the prob parameter

aggregate(.~Type, mydata, median, na.rm = TRUE)

Upvotes: 2

Related Questions