Ruben Berge Mathisen
Ruben Berge Mathisen

Reputation: 70

R: How to recode numeric variable into an ordinal variable with same N for each category?

So basically I want to turn a numeric income variable into an ordinal income variable where the cut-off points for the categories are decided so that each category ends up with the same N (or 1 less for one of the categories if it's an odd number N, to begin with).

Does anyone know how I can do this in R?

Upvotes: 1

Views: 692

Answers (1)

AntoniosK
AntoniosK

Reputation: 16121

Here's an example using mtcars.

I'd suggest you use the ntile function that splits your variable into groups with the same number of cases.

Assume that the variable of interest is disp:

library(dplyr)

mtcars %>%
  group_by(g = ntile(disp, 3)) %>%                        # split variable into 3 groups
  mutate(g_range = paste0(min(disp), "-", max(disp))) %>% # create the ranges
  ungroup() -> df

Your updated data (df) will look like this:

# # A tibble: 32 x 13
#    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb     g g_range  
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr>    
# 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     2 146.7-301
# 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     2 146.7-301
# 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     1 71.1-145 
# 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     2 146.7-301
# 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     3 304-472  
# 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     2 146.7-301
# 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4     3 304-472  
# 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     2 146.7-301
# 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     1 71.1-145 
#10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     2 146.7-301
# # ... with 22 more rows

You can check the number of cases within each group:

df %>% count(g, g_range)

# # A tibble: 3 x 3
#       g g_range       n
#   <int> <chr>     <int>
# 1     1 71.1-145     11
# 2     2 146.7-301    11
# 3     3 304-472      10

Upvotes: 3

Related Questions