Reputation: 139
I have data in wide format and I would like to have it summarized with e.g. median and IQR in long format. The summarizing of the long format works fine but then I need to clean and convert the column names to make rows out of them.
How can this be achieved?
MWE with desired output:
require(data.table)
# Demo data
dt <- data.table("c" = c(1,2,3,1,1,2,3,2), "323.1" = c(1,2,3,4,1,3,4,8), "454.3" = c(1,3,2,4,5,6,7,8)) # Real data has plenty more columns in the style "number.number"
# Create summary
dt[, unlist(recursive=FALSE, lapply(
.(med = median, iqr = IQR),
function(f) lapply(.SD, f)
)), by=.(c)]
# Desired output:
(dt_output <- data.table("c" = c(1,1,2,2,3,3), "var" = c("323.1", "454.3"), "med" = c(1,3,3.5,4,6,4.5), "iqr" = c(1.5,3,0.5,2,2.5,2.5)))
Output after summary:
c med.323.1 med.454.3 iqr.323.1 iqr.454.3
1: 1 1.0 4.0 1.5 2.0
2: 2 3.0 6.0 3.0 2.5
3: 3 3.5 4.5 0.5 2.5
Desired output:
c var med iqr
1: 1 323.1 1.0 1.5
2: 1 454.3 3.0 3.0
3: 2 323.1 3.5 0.5
4: 2 454.3 4.0 2.0
5: 3 323.1 6.0 2.5
6: 3 454.3 4.5 2.5
Thanks!
Upvotes: 0
Views: 583
Reputation: 2102
You can use gather()
to reashape your data, then summarise
to get the statistics like this:
library(tidyverse)
dt %>% gather(., var, value, -c) %>%
group_by(c, var) %>% summarise(med = median(value),
iqr = IQR(value))
# A tibble: 6 x 4
# Groups: c [3]
c var med iqr
<dbl> <chr> <dbl> <dbl>
1 1 323.1 1 1.5
2 1 454.3 4 2
3 2 323.1 3 3
4 2 454.3 6 2.5
5 3 323.1 3.5 0.5
6 3 454.3 4.5 2.5
Upvotes: 1
Reputation: 25225
An option is to melt first before performing your calculations:
melt(dt, id.vars="c")[, .(med = median(value), iqr = IQR(value)), .(c, variable)]
output:
c variable med iqr
1: 1 323.1 1.0 1.5
2: 2 323.1 3.0 3.0
3: 3 323.1 3.5 0.5
4: 1 454.3 4.0 2.0
5: 2 454.3 6.0 2.5
6: 3 454.3 4.5 2.5
Upvotes: 1