Reputation: 1036
I have some numeric variables which are categorised into a few bands (like 1-3, 3-5, 5-7 etc). I want to main their band order. For example, in the data frame below.
df <- data.frame(x = c("1-3", "3-5","5-9", "9-10", "10-12"))
When I run any data manipulation operation (like group_by, count) in this column, it returns this output.
Current Output
library(tidyverse)
df %>% count(x)
x n
<fct> <int>
1 1-3 1
2 3-5 1
3 5-9 1
4 9-10 1
5 10-12 1
Desired Output
x n
<fct> <int>
1 1-3 1
2 3-5 1
3 5-9 1
4 9-10 1
5 10-12 1
Important Note - Solution should be dynamic which means it should run on any type of numeric bands even if it starts from 1000 or any other numeric value (For example 1250 - 2500, 2500 - 5000, 5000 - 10000, 10000 - 20000 etc). Solution in dplyr is preferred one.
Upvotes: 0
Views: 124
Reputation: 389335
If x
is always sorted and in the same order as shown in the example you could arrange the factor levels based on their appearance before using count
.
library(dplyr)
library(rlang)
df %>%
mutate(x = factor(x, levels = unique(x))) %>%
count(x)
However, a general solution would be to get the number before "-" and arrange data based on that.
df %>%
mutate(x1 = as.numeric(sub('-.*', '', x)),
x = factor(x, levels = x[order(x1)])) %>%
count(x)
To wrap this in a function we can use :
count_band_data <- function(data, col, sep = '-') {
data %>%
mutate(temp = as.numeric(sub(paste0(sep, '.*'), '', {{col}})),
{{col}} := factor({{col}}, levels = {{col}}[order(temp)])) %>%
count({{col}})
}
and then use it as :
df %>% count_band_data(x)
# A tibble: 5 x 2
# x n
# <fct> <int>
#1 1-3 1
#2 3-5 1
#3 5-9 1
#4 9-10 1
#5 10-12 1
Upvotes: 1