dss
dss

Reputation: 467

Group by multiple columns and ordinally label groups?

I need to group by col1, and for each unique value in col2, give an integer starting at 1 and going up from there, but starting back at 1 for the next unique value in col1. It's easier to see in an example:

Example

Here's a data.frame.

df <- data.frame(
  col1=c(100, 100, 100, 100, 200, 200, 300, 300, 300),
  col2=c("a", "b", "c", "c",
               "b", "b", 
               "a", "c", "c")
)

df

#   col1 col2
# 1  100    a
# 2  100    b
# 3  100    c
# 4  100    c
# 5  200    b
# 6  200    b
# 7  300    a
# 8  300    c
# 9  300    c

For each unique value (i.e. group) in col1, each unique value (e.g. group) in col2 should be numbered ordinally:

Desired output:

df %>% 
  cbind(col3 =c(1, 2, 3, 3, 
          1, 1, 
          1, 2, 2))


#   col1 col2 col3
# 1  100    a    1
# 2  100    b    2
# 3  100    c    3
# 4  100    c    3
# 5  200    b    1
# 6  200    b    1
# 7  300    a    1
# 8  300    c    2
# 9  300    c    2

Note that for each group of col1, each new value in col2 is given its own integer.

Upvotes: 1

Views: 199

Answers (2)

TarJae
TarJae

Reputation: 78947

We could use rleid from data.table

library(data.table)
library(dplyr)
df %>% 
  group_by(col1) %>%
  mutate(new_col = rleid(col2))

Output:

   col1 col2  new_col
  <dbl> <chr>   <int>
1   100 a           1
2   100 b           2
3   100 c           3
4   100 c           3
5   200 b           1
6   200 b           1
7   300 a           1
8   300 c           2
9   300 c           2

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389055

Here are 3 ways to do it -

library(dplyr)

df %>%
  group_by(col1) %>%
  mutate(col3 = match(col2, unique(col2)), 
         col4 = as.integer(factor(col2)), 
         col5 = dense_rank(col2)) %>%
  ungroup

#   col1 col2   col3  col4  col5
#  <dbl> <chr> <int> <int> <int>
#1   100 a         1     1     1
#2   100 b         2     2     2
#3   100 c         3     3     3
#4   100 c         3     3     3
#5   200 b         1     1     1
#6   200 b         1     1     1
#7   300 a         1     1     1
#8   300 c         2     2     2
#9   300 c         2     2     2

Upvotes: 3

Related Questions