shotes
shotes

Reputation: 55

Pivoting a Dataframe in R

Suppose I am given a dataframe with multiple columns that are factors and a column of interest, colA.

For example, suppose the dataframe looks like the following:

colA | colB | colC | colD
--------------------------
  1  |  'a' |  1   |  2 
  1  |  'b' |  2   |  3
  4  |  'b' |  2   |  4
  2  |  'a' |  3   |  1
  3  |  'a' |  2   |  6
  3  |  'b' |  1   |  6

I would like to summarize every column based on a group-by with colA, but structure it in a way so that the values for colB, colC, colD are spread on the rows, and the values for colA are spread on the columns. That is, I would like the count of colB values (a row per value of colB) when colA has the value 1, when colA has the value 2, and so on. The same for colC and colD. The resulting dataframe will look as follows:

        colA_value1 | colA_value2 | colA_value3 | colA_value4
        -----------------------------------------------------
colB_a |     1      |      1      |      1      |      0
colB_b |     1      |      0      |      1      |      1
colC_1 |     1      |      0      |      1      |      0
colC_2 |     1      |      0      |      1      |      1
colC_3 |     0      |      1      |      0      |      0
colD_1 |     0      |      1      |      0      |      0
colD_2 |     1      |      0      |      0      |      0
colD_3 |     1      |      0      |      0      |      0
colD_4 |     0      |      0      |      0      |      1
colD_6 |     0      |      0      |      2      |      0

Preference given towards using the tidyverse packages.

Upvotes: 1

Views: 2231

Answers (2)

Sandipan Dey
Sandipan Dey

Reputation: 23101

with reshape2, melt() + dcast()

library(reshape2)

df <- read.table(header=TRUE, text='colA | colB | colC | colD
              1  |  a |  1   |  2
              1  |  b |  2   |  3
              4  |  b |  2   |  4
              2  |  a |  3   |  1
              3  |  a |  2   |  6
              3  |  b |  1   |  6', sep='|')

df2 <- melt(df, id.vars = 'colA')
df2$value <- trimws(df2$value)
df2$colA <- paste('colA_value', df2$colA, sep='')
df2$variable_value <- paste(df2$variable, df2$value, sep='_')
dcast(df2, variable_value~colA, fun=length)

#   variable_value colA_value1 colA_value2 colA_value3 colA_value4
#1          colB_a           1           1           1           0
#2          colB_b           1           0           1           1
#3          colC_1           1           0           1           0
#4          colC_2           1           0           1           1
#5          colC_3           0           1           0           0
#6          colD_1           0           1           0           0
#7          colD_2           1           0           0           0
#8          colD_3           1           0           0           0
#9          colD_4           0           0           0           1
#10         colD_6           0           0           2           0

Upvotes: 1

alistaire
alistaire

Reputation: 43334

This can be done with a lot of tidyr:

library(tidyverse)

df <- data.frame(colA = c(1L, 1L, 4L, 2L, 3L, 3L), 
                 colB = c("a", "b", "b", "a", "a", "b"), 
                 colC = c(1L, 2L, 2L, 3L, 2L, 1L), 
                 colD = c(2L, 3L, 4L, 1L, 6L, 6L))

df %>% 
    gather(key, value, colA) %>% 
    unite(colA, key, value) %>% 
    gather(key, value, -colA) %>% 
    unite(col, key, value) %>% 
    count(colA, col) %>% 
    spread(colA, n, fill = 0)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#> # A tibble: 10 x 5
#>    col    colA_1 colA_2 colA_3 colA_4
#>    <chr>   <dbl>  <dbl>  <dbl>  <dbl>
#>  1 colB_a      1      1      1      0
#>  2 colB_b      1      0      1      1
#>  3 colC_1      1      0      1      0
#>  4 colC_2      1      0      1      1
#>  5 colC_3      0      1      0      0
#>  6 colD_1      0      1      0      0
#>  7 colD_2      1      0      0      0
#>  8 colD_3      1      0      0      0
#>  9 colD_4      0      0      0      1
#> 10 colD_6      0      0      2      0

Upvotes: 0

Related Questions