john
john

Reputation: 1036

Sort Numeric Bands in R

I have some numeric variables which are categorised into a few bands (like 1-3, 3-5, 5-7 etc). I want to main their band order. For example, in the data frame below.

df <- data.frame(x = c("1-3", "3-5","5-9", "9-10", "10-12"))

When I run any data manipulation operation (like group_by, count) in this column, it returns this output.

Current Output

library(tidyverse)
df %>% count(x)

  x         n
  <fct> <int>
1 1-3       1
2 3-5       1
3 5-9       1
4 9-10      1
5 10-12     1

Desired Output

  x         n
  <fct> <int>
1 1-3       1
2 3-5       1
3 5-9       1
4 9-10      1
5 10-12     1

Important Note - Solution should be dynamic which means it should run on any type of numeric bands even if it starts from 1000 or any other numeric value (For example 1250 - 2500, 2500 - 5000, 5000 - 10000, 10000 - 20000 etc). Solution in dplyr is preferred one.

Upvotes: 0

Views: 124

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389335

If x is always sorted and in the same order as shown in the example you could arrange the factor levels based on their appearance before using count.

library(dplyr)
library(rlang)

df %>%
  mutate(x = factor(x, levels = unique(x))) %>% 
  count(x)

However, a general solution would be to get the number before "-" and arrange data based on that.

df %>%
  mutate(x1 = as.numeric(sub('-.*', '', x)), 
         x = factor(x, levels = x[order(x1)])) %>%
  count(x)

To wrap this in a function we can use :

count_band_data <- function(data, col, sep = '-') {
   data %>%
     mutate(temp = as.numeric(sub(paste0(sep, '.*'), '', {{col}})), 
            {{col}} := factor({{col}}, levels = {{col}}[order(temp)])) %>%
     count({{col}})
 }

and then use it as :

df %>% count_band_data(x) 


# A tibble: 5 x 2
#  x         n
#  <fct> <int>
#1 1-3       1
#2 3-5       1
#3 5-9       1
#4 9-10      1
#5 10-12     1

Upvotes: 1

Related Questions