Reputation: 23
I am quite new to r so apologies if this is basic (I am sure it is) but I haven’t been able to find an answer anywhere.
I have a data set showing individual product sales with about 500,000 rows and 30 columns, including region, sale amount, and date. I am using basic data.table functionality to create summaries based on single groups, like sales by date and sales by region(i.e. DT[,sum(Sales),by=Date]), which look like:
Date Sales
12/1 50,000
12/2 70,000
12/3 40,000
And
Region Sales
North 67000
East 60000
South 15000
West 22000
What I’d like to do is view (for example) sales broken down by region and date, so something like:
12/1 12/2 12/3
North 10,000 15,000 12,000
East 5,000 25,000 30,000
South 2,000 3,000 10,000
West 6,000 7,000 9,000
I’d like to be able to do this for all dates included in the file ( so I don’t need to change it manually every time I run this for different data sets and dates). Is this possible with data.table or something else?
Thanks!
Upvotes: 2
Views: 150
Reputation: 886978
For large datasets, the dcast
from data.table
should be efficient.
dcast(DT, Region~Date, value.var="Sales", sum)
NOTE: Here, we assume the 'Sales' as numeric
class. From the data showed in the example, there are ,
between numbers. So, it could be non-numeric. If it non-numeric, convert to numeric
after removing the ,
DT[, Sales := as.numeric(gsub(",", "", Sales))]
and then do the dcast
Upvotes: 3