user2894356
user2894356

Reputation: 139

R: How to summarize a data.table in long format using the column names of the original wide table as a rows?

I have data in wide format and I would like to have it summarized with e.g. median and IQR in long format. The summarizing of the long format works fine but then I need to clean and convert the column names to make rows out of them.

How can this be achieved?

MWE with desired output:

require(data.table)

# Demo data
dt <- data.table("c" = c(1,2,3,1,1,2,3,2), "323.1" = c(1,2,3,4,1,3,4,8), "454.3" = c(1,3,2,4,5,6,7,8)) # Real data has plenty more columns in the style "number.number"

# Create summary
dt[, unlist(recursive=FALSE, lapply(
  .(med = median, iqr = IQR),
  function(f) lapply(.SD, f)
)), by=.(c)]


# Desired output:
(dt_output <- data.table("c" = c(1,1,2,2,3,3), "var" = c("323.1", "454.3"), "med" = c(1,3,3.5,4,6,4.5), "iqr" = c(1.5,3,0.5,2,2.5,2.5)))

Output after summary:

       c med.323.1 med.454.3 iqr.323.1 iqr.454.3
1: 1       1.0       4.0       1.5       2.0
2: 2       3.0       6.0       3.0       2.5
3: 3       3.5       4.5       0.5       2.5

Desired output:

   c   var med iqr
1: 1 323.1 1.0 1.5
2: 1 454.3 3.0 3.0
3: 2 323.1 3.5 0.5
4: 2 454.3 4.0 2.0
5: 3 323.1 6.0 2.5
6: 3 454.3 4.5 2.5

Thanks!

Upvotes: 0

Views: 583

Answers (2)

David Jorquera
David Jorquera

Reputation: 2102

You can use gather() to reashape your data, then summarise to get the statistics like this:

library(tidyverse)

dt %>% gather(., var, value, -c) %>% 
  group_by(c, var) %>% summarise(med = median(value),
                                 iqr = IQR(value))


# A tibble: 6 x 4
# Groups:   c [3]
      c var     med   iqr
  <dbl> <chr> <dbl> <dbl>
1     1 323.1   1     1.5
2     1 454.3   4     2  
3     2 323.1   3     3  
4     2 454.3   6     2.5
5     3 323.1   3.5   0.5
6     3 454.3   4.5   2.5

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

An option is to melt first before performing your calculations:

melt(dt, id.vars="c")[, .(med = median(value), iqr = IQR(value)), .(c, variable)]

output:

   c variable med iqr
1: 1    323.1 1.0 1.5
2: 2    323.1 3.0 3.0
3: 3    323.1 3.5 0.5
4: 1    454.3 4.0 2.0
5: 2    454.3 6.0 2.5
6: 3    454.3 4.5 2.5

Upvotes: 1

Related Questions