Reputation: 63
I've a large integer vector (part of the data shown below):
a <- c(0,0,0,1,1,2,2,2,4,4,7,7,7,35,50,50, 200,200,500,500,500, 2500,2501,2502,2502)
I would like to create another vector (vector b) that categorizes vector a values into bins. The category values should be 1 for vector a values 0 - 6, 2 for 7 - 13, 3 for 14 - 20 ...
I know I can use the dplyr case_when()
function to mutate but when the data is big it may not be efficient.
Upvotes: 0
Views: 74
Reputation: 263451
The best way to categorize numeric data into ranges with a numeric output value is the findInterval
function. Examples:
> a <- c(0,0,0,1,1,2,2,2,4,4,7,7,7,35,50,50, 200,200,500,500,500, 2500,2501,2502,2502)
> findInterval( a, c(0, 6, 12, 18, 24))
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5
> findInterval( a, 6^(0:6))
[1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 5
> 6^(0:6)
[1] 1 6 36 216 1296 7776 46656
Note that the value returned for items below the min value in the second argument is 0 and the value for items above the max is the length of the vec
(i.e breaks) vector. The intervals are left-closed, right-open, which is the opposite of how the cut
function behaves (unless changed by parameters).
Upvotes: 3