hbabbar
hbabbar

Reputation: 967

How to count different groups using dplyr in R

I have the following df structure:

category difference factor
a        -0.12      1
a        -0.12      2
b        -0.17      3
b        -0.21      4

I want to categorise this data such that I can create identify each category separately by a number and rank them according to decreasing differences. Expected result is something like this:

category difference factor catCount rank
a        -0.12      1      2        2
a        -0.12      2      2        1
b        -0.17      3      1        2
b        -0.21      4      1        1

I'm using the following code to achieve this:

df %>% group_by(category) %>% mutate(categoryNumber = n_distinct(category)) %>% mutate(rank = rank(difference, ties.method = 'last'))

but getting the out put as :

category difference factor catCount rank
a        -0.12      1      2        2
a        -0.12      2      2        1
b        -0.17      3      2        2
b        -0.21      4      2        1

Any suggestions for this?

Upvotes: 1

Views: 116

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388797

counting n_distinct category for each category would always give 1. Try this :

library(dplyr)

df %>% 
  arrange(category, difference) %>%
  group_by(category) %>% 
  mutate(catCount = cur_group_id(), 
         rank = row_number()) %>%
  ungroup()

#  category difference factor catCount  rank
#  <chr>         <dbl>  <int>    <int> <int>
#1 a             -0.12      1        1     1
#2 a             -0.12      2        1     2
#3 b             -0.21      4        2     1
#4 b             -0.17      3        2     2

Here catCount is a unique number for each category whereas rank is rank based on decreasing differences.

Upvotes: 0

AnilGoyal
AnilGoyal

Reputation: 26218

use this

df %>% group_by(category, catcnt = dense_rank(desc(category))) %>% 
  mutate(rank = rank(difference, ties.method = 'last'))

# A tibble: 4 x 5
# Groups:   category [2]
  category difference factor catcnt  rank
  <chr>         <dbl>  <int>  <int> <int>
1 a             -0.12      1      2     2
2 a             -0.12      2      2     1
3 b             -0.17      3      1     2
4 b             -0.21      4      1     1

Upvotes: 3

Related Questions