Jay
Jay

Reputation: 915

Create bin intervals for a range in R for two different variables

I would like to create two bin intervals, one for "Sales", and another for "PCT change", for the Product Category at the same time. The "PCT change" varies from -ve to +ve, and Sales range from $2000 to $19000.

Below is my dataset.

Product,Sales,PCT Change
A,4769,0.72
A,9841,0.14
A,2594,-0.37
A,15970,-0.76
A,18142,0.9
A,19935,0.63
A,19023,0.63
A,4097,-0.31
A,17318,-0.38
A,15533,1.14
A,7596,0.74
A,4099,0.06
B,7294,-0.45
B,5473,-0.81
B,5813,-0.31
B,4084,0.8
B,10446,0.08
B,3820,0.12
B,3208,0.02
B,10100,-0.82
B,19125,0.14
B,7332,0.9
C,1870,0.28
C,5949,-0.62
C,2739,0.98
C,14388,0
C,19307,1.08
C,12312,0.68
C,15120,0.08
C,16456,0.24
C,1063,0.64
C,3587,-0.07
C,15480,0.31
C,8934,0.79
C,10392,0.57
C,9705,0.87
C,18181,0.93
C,4700,-0.88
C,7392,-0.79
C,7504,-0.75
C,14943,1.06

Below is the code i am trying.

df %>%
group_by(Product) %>%
mutate(freq=cut(Sales, breaks=seq(0,8000,by=50), include.lowest=TRUE))

I am not sure how the breaks will help.. I don't want to create more than 10 size of different bin. Also, not sure how to append the "PCT change" as well in the same code, to bring in two different intervals.

Expected output:

Product,Sales,PCT Change
    A,4769,0.72, [4000-5000],[0.65-0.78]
    A,9841,0.14 ,[9000-10000],[0.04-0.17]
    B,2594,-0.37,[2000-3000],[-0.39- -0.31]

Also, i am looking options to customize the range, suppose i want to bring 4500-5000 instead of 4000-5000. I am running a simulation actually, hence want to explore feasible options.

Any leads would be highly appreciated.

Thanks, J

Upvotes: 0

Views: 238

Answers (2)

GordonShumway
GordonShumway

Reputation: 2056

Expanding on the method you were using in the question:

sales_bin_size = 1000    

df %>%
    group_by(Product) %>%
    mutate(Sales_bin=cut(Sales, breaks=seq(0,round(max(Sales), -3),by=sales_bin_size), include.lowest=TRUE),
           PCT_change_bin=cut(`PCT Change`,breaks = 10, include.lowest=TRUE))

You can play around with the sales_bin_size to get the bin size you want. You can do the same for the PCT_change_bin by adding a new variable in a similar method.

Upvotes: 0

Dodge
Dodge

Reputation: 3309

A potential solution with data.table

Not absolutely certain if you want to examine the data to find appropriate bins or if you simply would like to filter data. This solution addresses the former--if this is not what you are looking for, then edit your question and I'll update my solution.

library(data.table)


DT[, sales.bins := paste0(round(floor(Sales), -3)-1000, "-", round(ceiling(Sales),-3)+1000)]
DT[, PCT.change.bins := paste0(floor(`PCT Change`)-1, "-", ceiling(`PCT Change`))]

This produces the following output:

> DT
    Product Sales PCT Change  sales.bins PCT.change.bins
 1:       A  4769       0.72   4000-6000            -1-1
 2:       A  9841       0.14  9000-11000            -1-1
 3:       A  2594      -0.37   2000-4000            -2-0
 4:       A 15970      -0.76 15000-17000            -2-0
 5:       A 18142       0.90 17000-19000            -1-1
 6:       A 19935       0.63 19000-21000            -1-1
 7:       A 19023       0.63 18000-20000            -1-1
 8:       A  4097      -0.31   3000-5000            -2-0
 9:       A 17318      -0.38 16000-18000            -2-0
10:       A 15533       1.14 15000-17000             0-2
11:       A  7596       0.74   7000-9000            -1-1
12:       A  4099       0.06   3000-5000            -1-1
13:       B  7294      -0.45   6000-8000            -2-0
14:       B  5473      -0.81   4000-6000            -2-0
15:       B  5813      -0.31   5000-7000            -2-0
16:       B  4084       0.80   3000-5000            -1-1
17:       B 10446       0.08  9000-11000            -1-1
18:       B  3820       0.12   3000-5000            -1-1
19:       B  3208       0.02   2000-4000            -1-1
20:       B 10100      -0.82  9000-11000            -2-0
21:       B 19125       0.14 18000-20000            -1-1
22:       B  7332       0.90   6000-8000            -1-1
23:       C  1870       0.28   1000-3000            -1-1
24:       C  5949      -0.62   5000-7000            -2-0
25:       C  2739       0.98   2000-4000            -1-1
26:       C 14388       0.00 13000-15000            -1-0
27:       C 19307       1.08 18000-20000             0-2
28:       C 12312       0.68 11000-13000            -1-1
29:       C 15120       0.08 14000-16000            -1-1
30:       C 16456       0.24 15000-17000            -1-1
31:       C  1063       0.64      0-2000            -1-1
32:       C  3587      -0.07   3000-5000            -2-0
33:       C 15480       0.31 14000-16000            -1-1
34:       C  8934       0.79  8000-10000            -1-1
35:       C 10392       0.57  9000-11000            -1-1
36:       C  9705       0.87  9000-11000            -1-1
37:       C 18181       0.93 17000-19000            -1-1
38:       C  4700      -0.88   4000-6000            -2-0
39:       C  7392      -0.79   6000-8000            -2-0
40:       C  7504      -0.75   7000-9000            -2-0
41:       C 14943       1.06 14000-16000             0-2
    Product Sales PCT Change  sales.bins PCT.change.bins

Find unique bins with the following:

> unique(DT$sales.bins)
 [1] "4000-6000"   "9000-11000"  "2000-4000"   "15000-17000" "17000-19000" "19000-21000" "18000-20000" "3000-5000"   "16000-18000"
[10] "7000-9000"   "6000-8000"   "5000-7000"   "1000-3000"   "13000-15000" "11000-13000" "14000-16000" "0-2000"      "8000-10000" 

> unique(DT$PCT.change.bins)
[1] "-1-1" "-2-0" "0-2"  "-1-0"

Upvotes: 1

Related Questions