Reputation: 1538
This question describes the setting for my question pretty well.
Instead of a second value however, I have a factor called algorithm
. My data frame looks like the following (note the possibility of multiplicity of values even within their group):
algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")
v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)
df <- data.frame(algorithm, v)
df
algorithm v
1 global 5
2 distributed 2
3 distributed 6
4 none 7
5 global 3
6 global 1
7 distributed 10
8 none 2
9 none 2
I would like to sort the dataframe by v
but get the ordering position for every entry with respect to its group (algorithm). This position should then be added to the original data frame (so I don't need to rearrange it) because I would like to plot the calculated position as x and the value as y using a ggplot (grouped by algorithm, e.g. every algorithm is one set of points).
So the result should look like this:
algorithm v groupIndex
1 global 5 3
2 distributed 2 1
3 distributed 6 2
4 none 7 3
5 global 3 2
6 global 1 1
7 distributed 10 3
8 none 2 1
9 none 2 2
So far I know I can order the data by algorithm first and then by value or the other way round. I guess in a second step I would have to calculate the index within each group? Is there an easy way to do that?
df[order(df$algorithm, df$v), ]
algorithm v
2 distributed 2
3 distributed 6
7 distributed 10
6 global 1
5 global 3
1 global 5
8 none 2
9 none 2
4 none 7
Edit: It is not guaranteed, that there is the same amount of entries for each group!
Upvotes: 1
Views: 1140
Reputation: 93813
A double application of order
in each group should cover it:
ave(df$v, df$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2
Which is also equivalent to:
ave(df$v, df$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2
, which in turn means you can take advantage of frank
from data.table
if you are concerned about speed:
setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
# algorithm v grpidx
#1: global 5 3
#2: distributed 2 1
#3: distributed 6 2
#4: none 7 3
#5: global 3 2
#6: global 1 1
#7: distributed 10 3
#8: none 2 1
#9: none 2 2
Upvotes: 3
Reputation: 23574
One way would be the following. You can order v
values for each group by using with_order()
, I think. You can assign ranks using row_number()
in the function. In this way, you can skip a step to arrange your data for each group as you tried with order()
.
library(dplyr)
group_by(df, algorithm) %>%
mutate(groupInd = with_order(order_by = v, fun = row_number, x = v))
# algorithm v groupInd
# <fctr> <int> <int>
#1 global 5 3
#2 distributed 2 1
#3 distributed 6 2
#4 none 7 3
#5 global 3 2
#6 global 1 1
#7 distributed 10 3
#8 none 2 1
#9 none 2 2
Upvotes: 2