Reputation: 408
I want to improve my processing time by replacing some 'forloops' with a vectorized alternative.
In the following there is a simplified example of what I am going to do with a much bigger dataset.
df <- data.frame(time = c(10, 12, 14, 14, 14, 17, 23, 23, 30, 32), ranks = vector(mode = 'double', length = 10))
df_hilf <- data.frame(time_hilf = c(10, 12, 14, 17, 23, 30, 32), ranking_hilf = c(1, 2, 4, 6, 7.5, 9, 10))
for (j in 1:nrow(df_hilf)) {
df$ranks[df$time == df_hilf$time_hilf[j]] <- df_hilf$ranking_hilf[j]
}
I've generated a dataframe called df which is ordered by time. The goal is to assign the ranks of another dataframe (in this example called df_hilf) to the initial dataframe.
As you can see the dataframes differ in length because in df_hilf only the unique times of df are stored.
The ranks stored in df_hilf are calculated by a specific rule (using adjusted ranks in reliability analysis). Just for simplicity I've used midranks in this example. Hence I really need this specific ranks stored in df_hilf.
At the end I want to have the same rank for same time values in df.
> df
time ranks
1 10 1.0
2 12 2.0
3 14 4.0
4 14 4.0
5 14 4.0
6 17 6.0
7 23 7.5
8 23 7.5
9 30 9.0
10 32 10.0
I think this could work with the function replicate
but I haven't found out how to set up the n
argument, since the occurrences of same time values also differ.
Unfortunately I also have not found a solution to this problem on the net. I apologize if I have overlooked something.
Upvotes: 2
Views: 325
Reputation: 23788
You could use match()
:
df$ranks <- df_hilf$ranking_hilf[match(df$time, df_hilf$time)]
#> df
# time ranks
#1 10 1.0
#2 12 2.0
#3 14 4.0
#4 14 4.0
#5 14 4.0
#6 17 6.0
#7 23 7.5
#8 23 7.5
#9 30 9.0
#10 32 10.0
Upvotes: 4