Reputation: 915
I would like to create two bin intervals, one for "Sales", and another for "PCT change", for the Product Category at the same time. The "PCT change" varies from -ve to +ve, and Sales range from $2000 to $19000.
Below is my dataset.
Product,Sales,PCT Change
A,4769,0.72
A,9841,0.14
A,2594,-0.37
A,15970,-0.76
A,18142,0.9
A,19935,0.63
A,19023,0.63
A,4097,-0.31
A,17318,-0.38
A,15533,1.14
A,7596,0.74
A,4099,0.06
B,7294,-0.45
B,5473,-0.81
B,5813,-0.31
B,4084,0.8
B,10446,0.08
B,3820,0.12
B,3208,0.02
B,10100,-0.82
B,19125,0.14
B,7332,0.9
C,1870,0.28
C,5949,-0.62
C,2739,0.98
C,14388,0
C,19307,1.08
C,12312,0.68
C,15120,0.08
C,16456,0.24
C,1063,0.64
C,3587,-0.07
C,15480,0.31
C,8934,0.79
C,10392,0.57
C,9705,0.87
C,18181,0.93
C,4700,-0.88
C,7392,-0.79
C,7504,-0.75
C,14943,1.06
Below is the code i am trying.
df %>%
group_by(Product) %>%
mutate(freq=cut(Sales, breaks=seq(0,8000,by=50), include.lowest=TRUE))
I am not sure how the breaks will help.. I don't want to create more than 10 size of different bin. Also, not sure how to append the "PCT change" as well in the same code, to bring in two different intervals.
Expected output:
Product,Sales,PCT Change
A,4769,0.72, [4000-5000],[0.65-0.78]
A,9841,0.14 ,[9000-10000],[0.04-0.17]
B,2594,-0.37,[2000-3000],[-0.39- -0.31]
Also, i am looking options to customize the range, suppose i want to bring 4500-5000 instead of 4000-5000. I am running a simulation actually, hence want to explore feasible options.
Any leads would be highly appreciated.
Thanks, J
Upvotes: 0
Views: 238
Reputation: 2056
Expanding on the method you were using in the question:
sales_bin_size = 1000
df %>%
group_by(Product) %>%
mutate(Sales_bin=cut(Sales, breaks=seq(0,round(max(Sales), -3),by=sales_bin_size), include.lowest=TRUE),
PCT_change_bin=cut(`PCT Change`,breaks = 10, include.lowest=TRUE))
You can play around with the sales_bin_size
to get the bin size you want. You can do the same for the PCT_change_bin
by adding a new variable in a similar method.
Upvotes: 0
Reputation: 3309
A potential solution with data.table
Not absolutely certain if you want to examine the data to find appropriate bins or if you simply would like to filter data. This solution addresses the former--if this is not what you are looking for, then edit your question and I'll update my solution.
library(data.table)
DT[, sales.bins := paste0(round(floor(Sales), -3)-1000, "-", round(ceiling(Sales),-3)+1000)]
DT[, PCT.change.bins := paste0(floor(`PCT Change`)-1, "-", ceiling(`PCT Change`))]
This produces the following output:
> DT
Product Sales PCT Change sales.bins PCT.change.bins
1: A 4769 0.72 4000-6000 -1-1
2: A 9841 0.14 9000-11000 -1-1
3: A 2594 -0.37 2000-4000 -2-0
4: A 15970 -0.76 15000-17000 -2-0
5: A 18142 0.90 17000-19000 -1-1
6: A 19935 0.63 19000-21000 -1-1
7: A 19023 0.63 18000-20000 -1-1
8: A 4097 -0.31 3000-5000 -2-0
9: A 17318 -0.38 16000-18000 -2-0
10: A 15533 1.14 15000-17000 0-2
11: A 7596 0.74 7000-9000 -1-1
12: A 4099 0.06 3000-5000 -1-1
13: B 7294 -0.45 6000-8000 -2-0
14: B 5473 -0.81 4000-6000 -2-0
15: B 5813 -0.31 5000-7000 -2-0
16: B 4084 0.80 3000-5000 -1-1
17: B 10446 0.08 9000-11000 -1-1
18: B 3820 0.12 3000-5000 -1-1
19: B 3208 0.02 2000-4000 -1-1
20: B 10100 -0.82 9000-11000 -2-0
21: B 19125 0.14 18000-20000 -1-1
22: B 7332 0.90 6000-8000 -1-1
23: C 1870 0.28 1000-3000 -1-1
24: C 5949 -0.62 5000-7000 -2-0
25: C 2739 0.98 2000-4000 -1-1
26: C 14388 0.00 13000-15000 -1-0
27: C 19307 1.08 18000-20000 0-2
28: C 12312 0.68 11000-13000 -1-1
29: C 15120 0.08 14000-16000 -1-1
30: C 16456 0.24 15000-17000 -1-1
31: C 1063 0.64 0-2000 -1-1
32: C 3587 -0.07 3000-5000 -2-0
33: C 15480 0.31 14000-16000 -1-1
34: C 8934 0.79 8000-10000 -1-1
35: C 10392 0.57 9000-11000 -1-1
36: C 9705 0.87 9000-11000 -1-1
37: C 18181 0.93 17000-19000 -1-1
38: C 4700 -0.88 4000-6000 -2-0
39: C 7392 -0.79 6000-8000 -2-0
40: C 7504 -0.75 7000-9000 -2-0
41: C 14943 1.06 14000-16000 0-2
Product Sales PCT Change sales.bins PCT.change.bins
Find unique bins with the following:
> unique(DT$sales.bins)
[1] "4000-6000" "9000-11000" "2000-4000" "15000-17000" "17000-19000" "19000-21000" "18000-20000" "3000-5000" "16000-18000"
[10] "7000-9000" "6000-8000" "5000-7000" "1000-3000" "13000-15000" "11000-13000" "14000-16000" "0-2000" "8000-10000"
> unique(DT$PCT.change.bins)
[1] "-1-1" "-2-0" "0-2" "-1-0"
Upvotes: 1