Reputation: 967
I have the following df structure:
category difference factor
a -0.12 1
a -0.12 2
b -0.17 3
b -0.21 4
I want to categorise this data such that I can create identify each category separately by a number and rank them according to decreasing differences. Expected result is something like this:
category difference factor catCount rank
a -0.12 1 2 2
a -0.12 2 2 1
b -0.17 3 1 2
b -0.21 4 1 1
I'm using the following code to achieve this:
df %>% group_by(category) %>% mutate(categoryNumber = n_distinct(category)) %>% mutate(rank = rank(difference, ties.method = 'last'))
but getting the out put as :
category difference factor catCount rank
a -0.12 1 2 2
a -0.12 2 2 1
b -0.17 3 2 2
b -0.21 4 2 1
Any suggestions for this?
Upvotes: 1
Views: 116
Reputation: 388797
counting n_distinct
category
for each category
would always give 1. Try this :
library(dplyr)
df %>%
arrange(category, difference) %>%
group_by(category) %>%
mutate(catCount = cur_group_id(),
rank = row_number()) %>%
ungroup()
# category difference factor catCount rank
# <chr> <dbl> <int> <int> <int>
#1 a -0.12 1 1 1
#2 a -0.12 2 1 2
#3 b -0.21 4 2 1
#4 b -0.17 3 2 2
Here catCount
is a unique number for each category
whereas rank
is rank based on decreasing differences.
Upvotes: 0
Reputation: 26218
use this
df %>% group_by(category, catcnt = dense_rank(desc(category))) %>%
mutate(rank = rank(difference, ties.method = 'last'))
# A tibble: 4 x 5
# Groups: category [2]
category difference factor catcnt rank
<chr> <dbl> <int> <int> <int>
1 a -0.12 1 2 2
2 a -0.12 2 2 1
3 b -0.17 3 1 2
4 b -0.21 4 1 1
Upvotes: 3