MoRe
MoRe

Reputation: 1538

Sorting data frame by column, adding index within group

This question describes the setting for my question pretty well.

Instead of a second value however, I have a factor called algorithm. My data frame looks like the following (note the possibility of multiplicity of values even within their group):

algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")
v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)
df <- data.frame(algorithm, v)
df
    algorithm  v
1      global  5
2 distributed  2
3 distributed  6
4        none  7
5      global  3
6      global  1
7 distributed 10
8        none  2
9        none  2

I would like to sort the dataframe by v but get the ordering position for every entry with respect to its group (algorithm). This position should then be added to the original data frame (so I don't need to rearrange it) because I would like to plot the calculated position as x and the value as y using a ggplot (grouped by algorithm, e.g. every algorithm is one set of points).

So the result should look like this:

    algorithm  v  groupIndex
1      global  5  3
2 distributed  2  1
3 distributed  6  2
4        none  7  3
5      global  3  2
6      global  1  1
7 distributed 10  3
8        none  2  1
9        none  2  2

So far I know I can order the data by algorithm first and then by value or the other way round. I guess in a second step I would have to calculate the index within each group? Is there an easy way to do that?

df[order(df$algorithm, df$v), ]
    algorithm  v
2 distributed  2
3 distributed  6
7 distributed 10
6      global  1
5      global  3
1      global  5
8        none  2
9        none  2
4        none  7

Edit: It is not guaranteed, that there is the same amount of entries for each group!

Upvotes: 1

Views: 1140

Answers (2)

thelatemail
thelatemail

Reputation: 93813

A double application of order in each group should cover it:

ave(df$v, df$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2

Which is also equivalent to:

ave(df$v, df$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2

, which in turn means you can take advantage of frank from data.table if you are concerned about speed:

setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
#     algorithm  v grpidx
#1:      global  5      3
#2: distributed  2      1
#3: distributed  6      2
#4:        none  7      3
#5:      global  3      2
#6:      global  1      1
#7: distributed 10      3
#8:        none  2      1
#9:        none  2      2

Upvotes: 3

jazzurro
jazzurro

Reputation: 23574

One way would be the following. You can order v values for each group by using with_order(), I think. You can assign ranks using row_number() in the function. In this way, you can skip a step to arrange your data for each group as you tried with order().

library(dplyr)
group_by(df, algorithm) %>%
mutate(groupInd = with_order(order_by = v, fun = row_number, x = v))

#    algorithm     v groupInd
#       <fctr> <int>    <int>
#1      global     5        3
#2 distributed     2        1
#3 distributed     6        2
#4        none     7        3
#5      global     3        2
#6      global     1        1
#7 distributed    10        3
#8        none     2        1
#9        none     2        2

Upvotes: 2

Related Questions