Reputation: 188
Anyone know a good way to match and categorize the first n digits of a number in R?
For example,
123451
123452
123461
123462
In this case, the if we match on the first n=1-4 digits, we would get all the same group. If we match with n=5 digits, we would get 2 groups.
I thought about doing this by making the numeric vector a character vector, splitting it so that each number is an element that can then be truncated to n digits, and matching based on those digits; however, I have a lot of numbers, and it seems there must be a better way to sort only the first n digits of a number in R. Any thoughts?
Thanks!
Upvotes: 1
Views: 503
Reputation: 14902
Here's a vectorised solution that does not involve conversion to character:
nums <- c(123451,
123452,
123461,
123462)
firstDigits <- function(x, n) {
ndigits <- floor(log10(x)) + 1
floor(x / 10^(ndigits - n))
}
factor(firstDigits(nums, 4))
## [1] 1234 1234 1234 1234
## Levels: 1234
factor(firstDigits(nums, 5))
## [1] 12345 12345 12346 12346
## Levels: 12345 12346
factor(firstDigits(nums, 6))
## [1] 123451 123452 123461 123462
## Levels: 123451 123452 123461 123462
Upvotes: 1