user3897
user3897

Reputation: 561

Convert n way contingency table to a dataframe in R

I am trying to create a table with the number of items sold by product name, year and region. I want a table that looks like below. Is there a way to do this in R rather than writing a sql query using sqldf function?

Product_Name      Region     Year     Count
English Muffins        1     2015     10000
Bagel                  1     2015      5601
Croissants              ....................

Here is the code to generate sample data. This dummy data does not correspond to the sample counts above.

Product_Name <- c("English Muffins","croissants","Kaiser rolls","Bagels","cinnamon puff","strawberry pastry")
Region_ID <- c(1:6)
Transaction_year <- c(2011:2016)

x <- data.frame()
for(i in 1:6)
  {
  for (j in 1:6)
    { 
    for(k in 1:6)
      {
      x <- rbind(x, data.frame(Product = Product_Name[i], Region = Region_ID[j], Year = Transaction_year[k]))
      }
    }
  }

Upvotes: 1

Views: 600

Answers (4)

IRTFM
IRTFM

Reputation: 263332

The base function as.data.frame.table will do this. I'm assuming you either have or can make an R contingency table along these lines:

mt <- with(x, table(Product,Region,Year))

You then get the desired "long format" object with:

 str(as.data.frame(mt))

'data.frame':   216 obs. of  4 variables:
 $ Product: Factor w/ 6 levels "English Muffins",..: 1 2 3 4 5 6 1 2 3 4 ...
 $ Region : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
 $ Year   : Factor w/ 6 levels "2011","2012",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Freq   : int  1 1 1 1 1 1 1 1 1 1 ...

The other useful table-flattening function is ftable. For a three way table it presents a more compact version of the display that print.table would yield :

ftable(mt)
                         Year 2011 2012 2013 2014 2015 2016
Product           Region                                   
English Muffins   1              1    1    1    1    1    1
                  2              1    1    1    1    1    1
                  3              1    1    1    1    1    1
                  4              1    1    1    1    1    1
                  5              1    1    1    1    1    1
                  6              1    1    1    1    1    1
croissants        1              1    1    1    1    1    1
                  2              1    1    1    1    1    1
                  3              1    1    1    1    1    1
                  4              1    1    1    1    1    1
                  5              1    1    1    1    1    1
                  6              1    1    1    1    1    1
Kaiser rolls      1              1    1    1    1    1    1
                  2              1    1    1    1    1    1
                  3              1    1    1    1    1    1
#-----snipped output--------

On the other hand if the request is to replicate numbers of rows by the Count variable then it would be done thusly:

#Makes something like your original dataframe:
orig <- structure(list(Product_Name = structure(c(2L, 1L), .Label = c("Bagel", 
"English_Muffins"), class = "factor"), Region = c(1L, 1L), Year = c(2015L, 
2015L), Count = c(5L, 4L)), .Names = c("Product_Name", "Region", 
"Year", "Count"), class = "data.frame", row.names = c(NA, -2L))

xlong <- orig[ rep(rownames(orig), orig$Count) , ]
    > xlong
       Product_Name Region Year Count
1   English_Muffins      1 2015     5
1.1 English_Muffins      1 2015     5
1.2 English_Muffins      1 2015     5
1.3 English_Muffins      1 2015     5
1.4 English_Muffins      1 2015     5
2             Bagel      1 2015     4
2.1           Bagel      1 2015     4
2.2           Bagel      1 2015     4
2.3           Bagel      1 2015     4

Upvotes: 3

Ryan C. Thompson
Ryan C. Thompson

Reputation: 42010

There's no need for complicated code here. All you need is one line of code:

> as.data.frame(table(x))
              Product Region Year Freq
1     English Muffins      1 2011    1
2          croissants      1 2011    1
3        Kaiser rolls      1 2011    1
4              Bagels      1 2011    1
5       cinnamon puff      1 2011    1
6   strawberry pastry      1 2011    1
...

The table function produces the contingency table as a 3-dimensional array, and as.data.frame converts the contingency table to a data frame in the format that you want. If x contains other columns, make sure to subset it to only the columns you want to tabulate.

Upvotes: 3

Mike H.
Mike H.

Reputation: 14360

Yes, you can do this by using data.table and a by statement. Very similar to a SQL group-by:

library(data.table)
setDT(x)[,count := .N, by = c("Product","Region","Year") ]
head(x)

           Product Region Year count
1: English Muffins      1 2011     1
2: English Muffins      1 2012     1
3: English Muffins      1 2013     1
4: English Muffins      1 2014     1
5: English Muffins      1 2015     1
6: English Muffins      1 2016     1

Upvotes: 3

Hack-R
Hack-R

Reputation: 23214

Product_Name <- c("English Muffins","croissants","Kaiser rolls","Bagels","cinnamon puff","strawberry pastry")
Region_ID <- c(1:6)
Transaction_year <- c(2011:2016)

x <- data.frame()
for(i in 1:6)
{
  for (j in 1:6)
  { 
    for(k in 1:6)
    {
      x <- rbind(x, data.frame(Product = Product_Name[i], Region = Region_ID[j], Year = Transaction_year[k]))
    }
  }
}

x$count <- 1

xx <- aggregate(x[,"count"],by=list(x$Product,x$Year,x$Region),sum)
colnames(xx) <- c("Product", "Year", "Region", "Count")
head(xx)

            Product Year Region Count
1   English Muffins 2011      1     1
2        croissants 2011      1     1
3      Kaiser rolls 2011      1     1
4            Bagels 2011      1     1
5     cinnamon puff 2011      1     1
6 strawberry pastry 2011      1     1

Upvotes: 3

Related Questions