Stamb
Stamb

Reputation: 23

Grouping by columns and rows in data.table in r

I am quite new to r so apologies if this is basic (I am sure it is) but I haven’t been able to find an answer anywhere.

I have a data set showing individual product sales with about 500,000 rows and 30 columns, including region, sale amount, and date. I am using basic data.table functionality to create summaries based on single groups, like sales by date and sales by region(i.e. DT[,sum(Sales),by=Date]), which look like:

Date    Sales
12/1    50,000
12/2    70,000
12/3    40,000

And

Region  Sales
North   67000
East    60000
South   15000
West    22000

What I’d like to do is view (for example) sales broken down by region and date, so something like:

        12/1    12/2    12/3
North   10,000  15,000  12,000
East    5,000   25,000  30,000
South   2,000   3,000   10,000
West    6,000   7,000   9,000

I’d like to be able to do this for all dates included in the file ( so I don’t need to change it manually every time I run this for different data sets and dates). Is this possible with data.table or something else?

Thanks!

Upvotes: 2

Views: 150

Answers (1)

akrun
akrun

Reputation: 886978

For large datasets, the dcast from data.table should be efficient.

dcast(DT, Region~Date, value.var="Sales", sum)

NOTE: Here, we assume the 'Sales' as numeric class. From the data showed in the example, there are , between numbers. So, it could be non-numeric. If it non-numeric, convert to numeric after removing the ,

DT[, Sales := as.numeric(gsub(",", "", Sales))]

and then do the dcast

Upvotes: 3

Related Questions