John-Henry
John-Henry

Reputation: 1807

How to convert numbers to letters based on the actual order of values?

I think this should be a pretty easy problem, but I can't seem to figure out the answer. Essentially I want to add a prefix to the values of a column, but without using any type of conditional statement. Is this possible?

i.e., I want something that produces the same result as

mtcars %>% 
  mutate(new_cyl = str_c(case_when(cyl == 6 ~ letters[1],
                                   cyl == 4 ~ letters[2],
                                   cyl == 8 ~ letters[3]), ") ", cyl))

But without using the case_when or any other conditional.

My expected output would append something (in this case letters) to the values of a column in the order they appear. Here it looks like:

enter image description here

Upvotes: 3

Views: 327

Answers (3)

utubun
utubun

Reputation: 4520

You can use factor() for the labeling:

mtcars %>%
  mutate(
    new_cyl = paste(factor(x, unique(x), letters[seq(unique(x))]), cyl, sep = ') ')
  )

Output (head):

#    mpg cyl disp  hp drat    wt  qsec vs am gear carb new_cyl
# 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    a) 6
# 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    a) 6
# 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1    b) 4
# 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    a) 6
# 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    c) 8
# 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    a) 6

Generalized as a function:

append_prefix <- function(x, sort = F){
  out <- paste(factor(x, unique(x), letters[seq(unique(x))]), x, sep = ') ')
  if(sort) out <- paste(factor(x, sort(unique(x)), letters[seq(unique(x))]), x, sep = ') ')
  return(out)
}

Upvotes: 1

camille
camille

Reputation: 16832

You can make a small lookup table of prefixes. That way, you know you're matching up cyl values to their prefixes correctly, rather than depending on the order of the data being correct. Then join the lookup to your data and paste.

library(dplyr)

lookup <- data.frame(
  cyl = c(6, 4, 8),
  prefix = letters[1:3]
)

For the case where you want to create the prefixes based on the order of the cyl column, rather than hardcoding cyl values to their prefixes, you could create the lookup table more dynamically, such as:

lookup <- data.frame(
  cyl = unique(mtcars$cyl),
  prefix = letters[seq_along(unique(mtcars$cyl))]
)

# select is just to get extra columns out of the way to show here
mtcars %>%
  select(cyl) %>%
  left_join(lookup, by = "cyl") %>%
  mutate(new_cyl = paste(prefix, cyl, sep = ") ")) %>%
  head()
#>   cyl prefix new_cyl
#> 1   6      a    a) 6
#> 2   6      a    a) 6
#> 3   4      b    b) 4
#> 4   6      a    a) 6
#> 5   8      c    c) 8
#> 6   6      a    a) 6

Upvotes: 4

tmfmnk
tmfmnk

Reputation: 39858

If you want to assign a letter to all "cyl" values:

mtcars %>%
 mutate(new_cyl = paste0(letters[cyl], ") ", cyl))

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb new_cyl
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4    f) 6
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4    f) 6
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    d) 4
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1    f) 6
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2    h) 8
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1    f) 6
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4    h) 8
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2    d) 4
9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2    d) 4
10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4    f) 6

To assign letters based on actual order of "cyl" values:

mtcars %>%
 mutate(dup = cumsum(!duplicated(cyl))) %>%
 group_by(cyl) %>%
 mutate(dup = first(dup),
        new_cyl = paste0(letters[dup], ") ", cyl)) %>%
 ungroup() %>%
 select(-dup) 

     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb new_cyl
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  
 1  21       6 160     110  3.9   2.62  16.5     0     1     4     4 a) 6   
 2  21       6 160     110  3.9   2.88  17.0     0     1     4     4 a) 6   
 3  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1 b) 4   
 4  21.4     6 258     110  3.08  3.22  19.4     1     0     3     1 a) 6   
 5  18.7     8 360     175  3.15  3.44  17.0     0     0     3     2 c) 8   
 6  18.1     6 225     105  2.76  3.46  20.2     1     0     3     1 a) 6   
 7  14.3     8 360     245  3.21  3.57  15.8     0     0     3     4 c) 8   
 8  24.4     4 147.     62  3.69  3.19  20       1     0     4     2 b) 4   
 9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2 b) 4   
10  19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4 a) 6

Here it, first, creates a cumulative sum of non-duplicate "cyl" values. Second, it groups by "cyl". Finally, it takes the first value of cumulative sum of non-duplicate values and assigns it a letter.

Upvotes: 2

Related Questions