Sean
Sean

Reputation: 103

Transforming a dataframe in r to apply pivot table

I have a data frame like below:

       Red  Green   Black
John    A   B       C
Sean    A   D       C
Tim     B   C       C

How can I transform it to below form to apply a pivot table (or if it can be done directly in r without transforming data):

Names   Code    Type
John    Red     A
John    Green   B
John    Black   C
Sean    Red     A
Sean    Green   D
Sean    Black   C
Tim     Red     B
Tim     Green   C
Tim     Black   C

So then my ultimate goal is to count the types as below by a pivot table on the transformed dataframe:

Count of Code for each type:

Row Labels  A   B   C   D   Grand Total
John            1   1   1       3      
Sean            1       1   1   3
Tim             1   2           3
Grand Total  2  2   4   1       9
```
reading similar topics did not help that much. 

Thanks in advance!
Regards

Upvotes: 0

Views: 270

Answers (3)

jaeyeon
jaeyeon

Reputation: 399

What you aim to do is (1) creating a contingency table and then (2) compute the sum of table entries for both rows and columns.

Step1: Create a contingency table

I first pivoted the data using pivot_longer() rather than gather() because it's more intuitive. Then, apply table() to the two variables of your interest.


# Toy example 

df <- structure(list(Red = c("A", "A", "B"), Green = c("B", "D", "C"
), Black = c("C", "C", "C")), class = "data.frame", row.names = c("John", 
"Sean", "Tim"))

# Pivot the data 
long_df <- tibble::rownames_to_column(df, var = "Names") %>%
  tidyverse::pivot_longer(cols = c(-Names),
               names_to = "Type", 
               values_to = "Code") 

# Create a contingency table 
df_table <- table(long_df$Names, long_df$Code)

Step 2: Compute the sum of entries for both rows and columns.

Again, I only used a base R function margin.table(). Using this approach also allows you to save the sum of the row and column entries for further analysis.

# Grand total (margin = 1 indicates rows)
df_table %>%
  margin.table(margin = 1)

# Grand total (margin = 2 indicates columns)
df_table %>%
  margin.table(margin = 2)

Upvotes: 0

jay.sf
jay.sf

Reputation: 72593

You can use reshape(). I'm not sure about your data structure, if there is a column with names or if they are row names. I've added both versions.

reshape(dat1, idvar="Names",
        varying=2:4,
        v.names="Type", direction="long",
        timevar="Code", times=c("red", "green", "black"),
        new.row.names=1:9)

reshape(transform(dat2, Names=rownames(dat2)), idvar="Names",
        varying=1:3,
        v.names="Type", direction="long",
        timevar="Code", times=c("red", "green", "black"),
        new.row.names=1:9)

#     V1  Code Type
# 1 John   red    A
# 2 Sean   red    A
# 3  Tim   red    B
# 4 John black    B
# 5 Sean black    D
# 6  Tim black    C
# 7 John green    C
# 8 Sean green    C
# 9  Tim green    C

To get kind of a raw version you could do:

res <- reshape(transform(dat2, Names=rownames(dat2)), idvar="Names",
               varying=1:3,
               v.names="Type", direction="long",
               timevar="Code")
res
#        Names Code Type
# John.1  John    1    A
# Sean.1  Sean    1    A
# Tim.1    Tim    1    B
# John.2  John    2    B
# Sean.2  Sean    2    D
# Tim.2    Tim    2    C
# John.3  John    3    C
# Sean.3  Sean    3    C
# Tim.3    Tim    3    C

After that you may assign labels at will to "Code" column by transforming to factor like so:

res$Code <- factor(res$Code, labels=c("red", "green", "black"))

Data

dat1 <- structure(list(Names = c("John", "Sean", "Tim"), Red = c("A", 
"A", "B"), Green = c("B", "D", "C"), Black = c("C", "C", "C")), row.names = c(NA, 
-3L), class = "data.frame")

dat2 <- structure(list(Red = c("A", "A", "B"), Green = c("B", "D", "C"
), Black = c("C", "C", "C")), row.names = c("John", "Sean", "Tim"
), class = "data.frame")

Upvotes: 0

r2evans
r2evans

Reputation: 160407

Using a literal dump from your first matrix-like frame above:

dat <- structure(list(Red = c("A", "A", "B"), Green = c("B", "D", "C"
), Black = c("C", "C", "C")), class = "data.frame", row.names = c("John", 
"Sean", "Tim"))

I can do this:

library(dplyr)
library(tidyr)
tibble::rownames_to_column(dat, var = "Names") %>%
  gather(Code, Type, -Names)
#   Names  Code Type
# 1  John   Red    A
# 2  Sean   Red    A
# 3   Tim   Red    B
# 4  John Green    B
# 5  Sean Green    D
# 6   Tim Green    C
# 7  John Black    C
# 8  Sean Black    C
# 9   Tim Black    C

We can extend that to get your next goal:

tibble::rownames_to_column(dat, var = "Names") %>%
  gather(Code, Type, -Names) %>%
  xtabs(~ Names + Type, data = .)
#       Type
# Names  A B C D
#   John 1 1 1 0
#   Sean 1 0 1 1
#   Tim  0 1 2 0

which then just needs marginals:

tibble::rownames_to_column(dat, var = "Names") %>%
  gather(Code, Type, -Names) %>%
  xtabs(~ Names + Type, data = .) %>%
  addmargins()
#       Type
# Names  A B C D Sum
#   John 1 1 1 0   3
#   Sean 1 0 1 1   3
#   Tim  0 1 2 0   3
#   Sum  2 2 4 1   9

Upvotes: 1

Related Questions