Reputation: 561
I am trying to create a table with the number of items sold by product name, year and region. I want a table that looks like below. Is there a way to do this in R rather than writing a sql query using sqldf function?
Product_Name Region Year Count
English Muffins 1 2015 10000
Bagel 1 2015 5601
Croissants ....................
Here is the code to generate sample data. This dummy data does not correspond to the sample counts above.
Product_Name <- c("English Muffins","croissants","Kaiser rolls","Bagels","cinnamon puff","strawberry pastry")
Region_ID <- c(1:6)
Transaction_year <- c(2011:2016)
x <- data.frame()
for(i in 1:6)
{
for (j in 1:6)
{
for(k in 1:6)
{
x <- rbind(x, data.frame(Product = Product_Name[i], Region = Region_ID[j], Year = Transaction_year[k]))
}
}
}
Upvotes: 1
Views: 600
Reputation: 263332
The base function as.data.frame.table
will do this. I'm assuming you either have or can make an R contingency table along these lines:
mt <- with(x, table(Product,Region,Year))
You then get the desired "long format" object with:
str(as.data.frame(mt))
'data.frame': 216 obs. of 4 variables:
$ Product: Factor w/ 6 levels "English Muffins",..: 1 2 3 4 5 6 1 2 3 4 ...
$ Region : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Year : Factor w/ 6 levels "2011","2012",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Freq : int 1 1 1 1 1 1 1 1 1 1 ...
The other useful table-flattening function is ftable
. For a three way table it presents a more compact version of the display that print.table
would yield :
ftable(mt)
Year 2011 2012 2013 2014 2015 2016
Product Region
English Muffins 1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 1
4 1 1 1 1 1 1
5 1 1 1 1 1 1
6 1 1 1 1 1 1
croissants 1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 1
4 1 1 1 1 1 1
5 1 1 1 1 1 1
6 1 1 1 1 1 1
Kaiser rolls 1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 1
#-----snipped output--------
On the other hand if the request is to replicate numbers of rows by the Count variable then it would be done thusly:
#Makes something like your original dataframe:
orig <- structure(list(Product_Name = structure(c(2L, 1L), .Label = c("Bagel",
"English_Muffins"), class = "factor"), Region = c(1L, 1L), Year = c(2015L,
2015L), Count = c(5L, 4L)), .Names = c("Product_Name", "Region",
"Year", "Count"), class = "data.frame", row.names = c(NA, -2L))
xlong <- orig[ rep(rownames(orig), orig$Count) , ]
> xlong
Product_Name Region Year Count
1 English_Muffins 1 2015 5
1.1 English_Muffins 1 2015 5
1.2 English_Muffins 1 2015 5
1.3 English_Muffins 1 2015 5
1.4 English_Muffins 1 2015 5
2 Bagel 1 2015 4
2.1 Bagel 1 2015 4
2.2 Bagel 1 2015 4
2.3 Bagel 1 2015 4
Upvotes: 3
Reputation: 42010
There's no need for complicated code here. All you need is one line of code:
> as.data.frame(table(x))
Product Region Year Freq
1 English Muffins 1 2011 1
2 croissants 1 2011 1
3 Kaiser rolls 1 2011 1
4 Bagels 1 2011 1
5 cinnamon puff 1 2011 1
6 strawberry pastry 1 2011 1
...
The table
function produces the contingency table as a 3-dimensional array, and as.data.frame
converts the contingency table to a data frame in the format that you want. If x
contains other columns, make sure to subset it to only the columns you want to tabulate.
Upvotes: 3
Reputation: 14360
Yes, you can do this by using data.table
and a by
statement. Very similar to a SQL
group-by:
library(data.table)
setDT(x)[,count := .N, by = c("Product","Region","Year") ]
head(x)
Product Region Year count
1: English Muffins 1 2011 1
2: English Muffins 1 2012 1
3: English Muffins 1 2013 1
4: English Muffins 1 2014 1
5: English Muffins 1 2015 1
6: English Muffins 1 2016 1
Upvotes: 3
Reputation: 23214
Product_Name <- c("English Muffins","croissants","Kaiser rolls","Bagels","cinnamon puff","strawberry pastry")
Region_ID <- c(1:6)
Transaction_year <- c(2011:2016)
x <- data.frame()
for(i in 1:6)
{
for (j in 1:6)
{
for(k in 1:6)
{
x <- rbind(x, data.frame(Product = Product_Name[i], Region = Region_ID[j], Year = Transaction_year[k]))
}
}
}
x$count <- 1
xx <- aggregate(x[,"count"],by=list(x$Product,x$Year,x$Region),sum)
colnames(xx) <- c("Product", "Year", "Region", "Count")
head(xx)
Product Year Region Count
1 English Muffins 2011 1 1
2 croissants 2011 1 1
3 Kaiser rolls 2011 1 1
4 Bagels 2011 1 1
5 cinnamon puff 2011 1 1
6 strawberry pastry 2011 1 1
Upvotes: 3