Reputation: 43
I have an issue with sequentially updating rankings and no matter how I try to search for a solution - or come up with one myself - I fail.
I am trying to analyse results of an experiment of sequential choice in which participants had to find the best possible option (the option with the highest rating). They were presented with a rating in every trial.
I have an ID, an order and a rating variable for every choice. ID is the participant, rating represents how good the option is (the higher the rating the better) and order is the number of the trial (in this example there were 4 trials)
ID rating order
1 4 1
1 3 2
1 5 3
1 2 4
2 3 1
2 5 2
2 2 3
2 1 4
I would like to create a new variable called "current_rank" which is basically the ranking of the rating of the current choice. This variable always needs to take into consideration all previous trials and ratings so e.g. for the participant with ID "1" this would be:
Trial 1: rating = 4, which means this is the best rating so far, current_rank = 1
Trial 2: rating = 3, which means this is the second best rating so far, current_rank = 2
Trial 3: rating = 5, which means this is the best rating so far, making it the new number 1 so, current_rank = 1
Trial 4: rating = 2, which means this is nowhere near the best, current_rank = 4
If I could do this with all participants and all choices my database should look like this:
ID rating order current_rank
1 4 1 1
1 3 2 2
1 5 3 1
1 2 4 4
2 3 1 1
2 5 2 1
2 2 3 3
2 1 4 4
I could successfully create an overall ranking variable like this:
db %>%
arrange(ID, order) %>%
group_by(ID) %>%
mutate(ovr_rank = min_rank(desc(rating)))
But my goal is to create a variable that is something of a sequential ranking. This would make it possible to see what kind of opinion the participant may have formed about the current rating based on the previous ratings, without knowing what future ratings might be. I tried creating loops or use the apply functions, but couldn't come up with a solution yet.
Any and all ideas are greatly appreciated!
Upvotes: 4
Views: 100
Reputation: 25225
Here are 2 options using data.table
:
1) non-equi join to find all trials before and incl current trial, rank the rating and extract the current rank:
DT[, cr := .SD[.SD, on=.(ID, trial<=trial), by=.EACHI, order(order(-rating))[.N]]$V1]
2) non-equi join to find number of ratings that are higher than current rating in trials before current trial:
DT[, cr2 := DT[DT, on=.(ID, trial<=trial, rating>rating), by=.EACHI, .N + 1L]$V1]
Note that there might be ties in ratings and it will be good to specify how ratings ties should be handled.
output:
ID rating trial cr cr2
1: 1 4 1 1 1
2: 1 3 2 2 2
3: 1 5 3 1 1
4: 1 2 4 4 4
5: 2 3 1 1 1
6: 2 5 2 1 1
7: 2 2 3 3 3
8: 2 1 4 4 4
data:
library(data.table)
DT <- fread("ID rating trial
1 4 1
1 3 2
1 5 3
1 2 4
2 3 1
2 5 2
2 2 3
2 1 4")
Upvotes: 1
Reputation: 2867
Use runner to apply any R function in cumulative window (or rolling window). Below I used runner
which rolls rating and applies rank function on "available" data at the moment (cumulative rank). Uncomment print
to exhibit what lands into function(x)
.
library(dplyr)
library(runner)
data %>%
arrange(ID, order) %>%
group_by(ID) %>%
mutate(
current_rank = runner(
x = rating,
f = function(x) {
# print(x)
rank_available_at_the_moment <- rank(-x, ties.method = "last")
tail(rank_available_at_the_moment, 1)
}
)
)
# # A tibble: 8 x 4
# # Groups: ID [2]
# ID rating order current_rank
# <int> <int> <int> <int>
# 1 1 4 1 1
# 2 1 3 2 2
# 3 1 5 3 1
# 4 1 2 4 4
# 5 2 3 1 1
# 6 2 5 2 1
# 7 2 2 3 3
# 8 2 1 4 4
data
data <- read.table(text = "ID rating order
1 4 1
1 3 2
1 5 3
1 2 4
2 3 1
2 5 2
2 2 3
2 1 4", header = TRUE)
Upvotes: 2
Reputation: 570
This chunk of code will work:
df <- tibble(
ID = c(1,1,1,1,2,2,2,2),
rating = c(4,3,5,2,3,5,2,1),
rank = c(1,0,0,0,0,0,0,0)
)
for(i in 2:nrow(df)){
if(df$ID[i] != df$ID[i-1]){
df$rank[i] <- 1
} else {
df$rank[i] <- which(sort(df[1:i,]$rating[which(df$ID == df$ID[i])], decreasing = TRUE) == df$rating[i])
}
}
Explanation:
Note that I assume your dataframe is already ordered based on ID and order. In my df
there is no order
column, but it is mainly for simplicity (and it is not necessarily needed in my solution, again, assuming the rows are already ordered by ID and order).
The for
loop simply looks if the ID of that row is different from the row above, it automatically gets rank 1. Otherwise, it looks on the subset of df
from row 1 to row i
, subsets again by similar ID, sorts the ratings in that subset (including our current rating in question) in descending order, and takes the position of our currently asked rating to be assigned as its rank value.
I hope this answers your question and gives you insight.
Upvotes: 1