Reputation: 93
Here is my dataset:
df <- data.frame(label = c(1,2,3,4,5), measurement = c(100.5, 84.7, 100.7, 77.9, 98.8), size = c(20, 19, 20, 20, 15))
Now I want to be able to obtain the label (from label column) that has the maximum size. However, as in the example above, three of the labels have the maximum value of 20. I want my tie breaker to be the measurement values. So in this case, out of the three values of 20 from the size column, measurement of 100.7 is the greatest.
So I would run the code and it would return 3 has the label I should go with. How can I do this across columns?
Upvotes: 1
Views: 1884
Reputation: 72613
It's probably wise to work with rank
here, to account for duplicates (in contrast to order
, which selects the first maximum and dplyr::last
the last, thus omitting the other maxima).
Think of this data frame with duplicates in the value columns:
df
# label measurement size
# 1 1 100.5 20
# 2 2 84.7 19
# 3 3 100.7 20 ## !
# 4 4 77.9 20
# 5 5 98.8 15
# 6 6 100.7 20 ## !
We would add the rowSums
of the rank
s of the value columns and add the rank
of the "tie breaker" size and finally subset the levels on the max
ima.
rankv <- Vectorize(rank)
R <- rankv(df[-1]) ## or `apply(df[-1], 2, rank)`
r <- rowSums(R) + rank(R[,2])
df$label[r == max(r)]
# [1] 3 6
Data:
df <- structure(list(label = c(1, 2, 3, 4, 5, 6), measurement = c(100.5,
84.7, 100.7, 77.9, 98.8, 100.7), size = c(20, 19, 20, 20, 15,
20)), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 0
Reputation: 33498
In one line in base R:
df[order(df$size, df$measurement, decreasing = TRUE)[1], "label"]
# [1] 3
Upvotes: 1
Reputation:
The filter()
function from dplyr
will operate sequentially when specified as comma separated arguments. You can do something like this to filter to the observation and then pull()
the label.
This method will give you multiple responses if the tiebreaker is also tied.
library(dplyr)
df %>%
filter(size == max(size), measurement == max(measurement)) %>%
pull(label)
Upvotes: 1
Reputation: 388817
You can order
the dataframe in descending order based on size
and measurement
column and then extract the first label
from it.
df1 <- df[with(df, order(-size, -measurement)), ]
df1$label[1]
#[1] 3
Or sort them in ascending order and extract the last value. Using dplyr
:
library(dplyr)
df %>%
arrange(size, measurement) %>%
pull(label) %>% last
#[1] 3
Upvotes: 1