claudiadast
claudiadast

Reputation: 611

How to keep other columns when using aggregate in R?

I have a dataframe, p4p5, that contains the following columns:

p4p5 <- c("SampleID", "expr", "Gene", "Period", "Consequence", "isPTV")

I've used the aggregate function here to find the median expression per Gene:

p4p5_med <- aggregate(expr ~ Gene, p4p5, median)

However, this results in a dataframe with the columns "expr" and "Gene" only. How can I still retain all the original columns when applying the aggregate function?

UPDATE:

Input (p4p5):

SampleID   expr  Gene        Period  Consequence            isPTV
HSB430    -1.23  ENSG000098  4       upstream_gene_variant  0
HSB321    -0.02  ENSG000098  5       stop_gained            1
HSB296     3.12  ENSG000027  4       upstream_gene_variant  0
HSB201     1.22  ENSG000027  4       intron_variant         0
HSB220     0.13  ENSG000013  6       intron_variant         0

Expected output:

SampleID   expr  Gene        Period  Consequence           isPTV  Median
HSB430    -1.23  ENSG000098  4       upstream_gene_variant  0    -0.625 
HSB321    -0.02  ENSG000098  5       stop_gained            1    -0.625
HSB296     3.12  ENSG000027  4       upstream_gene_variant  0     2.17
HSB201     1.22  ENSG000027  4       intron_variant         0     2.17
HSB220     0.13  ENSG000013  6       intron_variant         0     0.13

Upvotes: 2

Views: 2185

Answers (1)

neilfws
neilfws

Reputation: 33782

I'd use dplyr for this:

library(dplyr)

p4p5 %>% 
  group_by(Gene) %>% 
  mutate(Median = median(expr, na.rm = TRUE)) %>%
  ungroup()

  SampleID  expr Gene       Period Consequence           isPTV Median
  <chr>    <dbl> <chr>       <int> <chr>                 <int>  <dbl>
1 HSB430   -1.23 ENSG000098      4 upstream_gene_variant     0 -0.625
2 HSB321   -0.02 ENSG000098      5 stop_gained               1 -0.625
3 HSB296    3.12 ENSG000027      4 upstream_gene_variant     0  2.17 
4 HSB201    1.22 ENSG000027      4 intron_variant            0  2.17 
5 HSB220    0.13 ENSG000013      6 intron_variant            0  0.13

Upvotes: 1

Related Questions