Reputation: 611
I have a dataframe, p4p5
, that contains the following columns:
p4p5 <- c("SampleID", "expr", "Gene", "Period", "Consequence", "isPTV")
I've used the aggregate
function here to find the median expression per Gene:
p4p5_med <- aggregate(expr ~ Gene, p4p5, median)
However, this results in a dataframe with the columns "expr" and "Gene" only. How can I still retain all the original columns when applying the aggregate function?
UPDATE:
Input (p4p5
):
SampleID expr Gene Period Consequence isPTV
HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0
HSB321 -0.02 ENSG000098 5 stop_gained 1
HSB296 3.12 ENSG000027 4 upstream_gene_variant 0
HSB201 1.22 ENSG000027 4 intron_variant 0
HSB220 0.13 ENSG000013 6 intron_variant 0
Expected output:
SampleID expr Gene Period Consequence isPTV Median
HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0 -0.625
HSB321 -0.02 ENSG000098 5 stop_gained 1 -0.625
HSB296 3.12 ENSG000027 4 upstream_gene_variant 0 2.17
HSB201 1.22 ENSG000027 4 intron_variant 0 2.17
HSB220 0.13 ENSG000013 6 intron_variant 0 0.13
Upvotes: 2
Views: 2185
Reputation: 33782
I'd use dplyr
for this:
library(dplyr)
p4p5 %>%
group_by(Gene) %>%
mutate(Median = median(expr, na.rm = TRUE)) %>%
ungroup()
SampleID expr Gene Period Consequence isPTV Median
<chr> <dbl> <chr> <int> <chr> <int> <dbl>
1 HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0 -0.625
2 HSB321 -0.02 ENSG000098 5 stop_gained 1 -0.625
3 HSB296 3.12 ENSG000027 4 upstream_gene_variant 0 2.17
4 HSB201 1.22 ENSG000027 4 intron_variant 0 2.17
5 HSB220 0.13 ENSG000013 6 intron_variant 0 0.13
Upvotes: 1