Lee88
Lee88

Reputation: 1295

dataframes in R with multi-level rows and columns

Let's say I have the following data frame.

> df = data.frame(rowsA = sample(c('A','B','C'), 100, replace=TRUE),
                rowsB = sample(c('D','E','F'), 100, replace=TRUE),
                colsA = sample(c('G','H','I'), 100, replace=TRUE),
                colsB = sample(c('J','K','L'), 100, replace=TRUE))

> head(df)
  rowsA rowsB colsA colsB
1     B     E     I     L
2     A     E     G     J
3     A     E     H     K
4     A     D     I     J
5     C     F     G     J
6     A     F     G     J

Is it possible to create a multi-level table of counts?

In excel, it is possible with the PivotTable functionality

enter image description here

I think it possible in python in pandas with the df.columns.levels method.

I also figured out how to do multi-level rows only in R with dplyr (but haven't figured out multi-level columns)

df %>%
    group_by(rowsA, rowsB, colsA) %>%
    summarise(count = n()) %>%
    spread(colsA, count)

# A tibble: 9 x 5
# Groups:   rowsA, rowsB [9]
   rowsA  rowsB     G     H     I
* <fctr> <fctr> <int> <int> <int>
1      A      D     5     3     1
2      A      E     1     2     1
3      A      F     5     8    NA
4      B      D     5     5     5
5      B      E     2     4     6
6      B      F     4     6     5
7      C      D     2     6    NA
8      C      E     6     5     3
9      C      F     4     3     3

Upvotes: 0

Views: 4471

Answers (1)

akuiper
akuiper

Reputation: 214957

Paste the columns that goes to the header into one column, then reshape it, in which way you have a contingency table with the same meaning as the multi level count:

library(dplyr); library(tidyr)
df %>% 
    unite(header, c('colsA', 'colsB')) %>% 
    count(rowsA, rowsB, header) %>% 
    spread(header, n, fill = 0)

# A tibble: 9 x 11
#   rowsA  rowsB   G_J   G_K   G_L   H_J   H_K   H_L   I_J   I_K   I_L
#* <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1      A      D     1     0     0     0     3     1     1     1     0
#2      A      E     2     0     0     1     1     0     0     0     1
#3      A      F     5     0     0     3     2     1     0     1     1
#4      B      D     0     1     1     1     0     3     1     1     1
#5      B      E     2     2     1     3     1     1     0     3     1
#6      B      F     1     1     2     3     3     0     1     2     1
#7      C      D     0     2     3     1     2     0     4     3     2
#8      C      E     2     2     2     1     2     0     0     1     1
#9      C      F     1     0     1     2     0     1     2     1     2

Or if you are OK with a table/array/matrix as result, you can use xtabs, (borrowed from this answer), which essentially gives a 4-d array but with ftable, it can be displayed as you need:

ftable(xtabs(data = df), row.vars = 1:2, col.vars = 3:4)

#            colsA G     H     I    
#            colsB J K L J K L J K L
#rowsA rowsB                        
#A     D           1 0 0 0 3 1 1 1 0
#      E           2 0 0 1 1 0 0 0 1
#      F           5 0 0 3 2 1 0 1 1
#B     D           0 1 1 1 0 3 1 1 1
#      E           2 2 1 3 1 1 0 3 1
#      F           1 1 2 3 3 0 1 2 1
#C     D           0 2 3 1 2 0 4 3 2
#      E           2 2 2 1 2 0 0 1 1
#      F           1 0 1 2 0 1 2 1 2

Upvotes: 1

Related Questions