Reputation: 363
I have this code
bucket <- seq(0, 100000, by = 5000)
dt <-
data.frame(sold_amount = bucket) %>%
mutate(bucket = cut(bucket, breaks = bucket, include.lowest = T, dig.lab = 10))
If I execute it, bucket [0, 5000] is duplicated, with include.lowest = T bucket for amount 0 is na How can i get bins [0,5000] for sold amount 0 and (5000,10000] for sold amount 5000?
Upvotes: 0
Views: 139
Reputation: 2262
An approach with my santoku package:
library(santoku)
dt$bucket <- chop_width(dt$sold_amount, 5000, labels = lbl_intervals("%d"))
dt
sold_amount bucket
1 0 [0, 5000)
2 5000 [5000, 10000)
3 10000 [10000, 15000)
4 15000 [15000, 20000)
5 20000 [20000, 25000)
6 25000 [25000, 30000)
7 30000 [30000, 35000)
8 35000 [35000, 40000)
9 40000 [40000, 45000)
10 45000 [45000, 50000)
11 50000 [50000, 55000)
12 55000 [55000, 60000)
13 60000 [60000, 65000)
14 65000 [65000, 70000)
15 70000 [70000, 75000)
16 75000 [75000, 80000)
17 80000 [80000, 85000)
18 85000 [85000, 90000)
19 90000 [90000, 95000)
20 95000 [95000, 100000)
21 100000 {100000}
Upvotes: 0
Reputation: 1079
Maybe just remove the first row
dt <-
data.frame(sold_amount = bucket) %>%
mutate(bucket = cut(bucket, breaks = bucket, include.lowest = T, dig.lab = 10))%>%
.[-1,]
dt
Upvotes: 0
Reputation: 101044
Maybe this?
cut(bucket, breaks = c(bucket,Inf), include.lowest = T, right = FALSE, dig.lab = 10)
such that
> dt <-
+ data.frame(sold_amount = bucket) %>%
+ mutate(bucket = cut(bucket, breaks = c(bucket, Inf), include.lowest = T, right = FALSE, dig.lab = .... [TRUNCATED]
> dt
sold_amount bucket
1 0 [0,5000)
2 5000 [5000,10000)
3 10000 [10000,15000)
4 15000 [15000,20000)
5 20000 [20000,25000)
6 25000 [25000,30000)
7 30000 [30000,35000)
8 35000 [35000,40000)
9 40000 [40000,45000)
10 45000 [45000,50000)
11 50000 [50000,55000)
12 55000 [55000,60000)
13 60000 [60000,65000)
14 65000 [65000,70000)
15 70000 [70000,75000)
16 75000 [75000,80000)
17 80000 [80000,85000)
18 85000 [85000,90000)
19 90000 [90000,95000)
20 95000 [95000,100000)
21 100000 [100000,Inf]
Upvotes: 2
Reputation: 1550
A pragmatic approach would be to just filter out the offending line:
library(tidyverse)
bucket <- seq(0, 100000, by = 5000)
dt <-
data.frame(sold_amount = bucket) %>%
mutate(bucket = cut(bucket, breaks = bucket, include.lowest = T, dig.lab = 10)) %>%
dplyr::filter(sold_amount != 0)
> dt
sold_amount bucket
1 5000 [0,5000]
2 10000 (5000,10000]
3 15000 (10000,15000]
4 20000 (15000,20000]
5 25000 (20000,25000]
6 30000 (25000,30000]
7 35000 (30000,35000]
8 40000 (35000,40000]
9 45000 (40000,45000]
10 50000 (45000,50000]
11 55000 (50000,55000]
12 60000 (55000,60000]
13 65000 (60000,65000]
14 70000 (65000,70000]
15 75000 (70000,75000]
16 80000 (75000,80000]
17 85000 (80000,85000]
18 90000 (85000,90000]
19 95000 (90000,95000]
20 100000 (95000,100000]
Upvotes: 0