Sanam
Sanam

Reputation: 93

How to filter for value in one column based on values from other columns in R?

Here is my dataset:

df <- data.frame(label = c(1,2,3,4,5), measurement = c(100.5, 84.7, 100.7, 77.9, 98.8), size = c(20, 19, 20, 20, 15)) 

Now I want to be able to obtain the label (from label column) that has the maximum size. However, as in the example above, three of the labels have the maximum value of 20. I want my tie breaker to be the measurement values. So in this case, out of the three values of 20 from the size column, measurement of 100.7 is the greatest.

So I would run the code and it would return 3 has the label I should go with. How can I do this across columns?

Upvotes: 1

Views: 1884

Answers (4)

jay.sf
jay.sf

Reputation: 72613

It's probably wise to work with rank here, to account for duplicates (in contrast to order, which selects the first maximum and dplyr::last the last, thus omitting the other maxima).

Think of this data frame with duplicates in the value columns:

df
#   label measurement size
# 1     1       100.5   20
# 2     2        84.7   19
# 3     3       100.7   20  ## !
# 4     4        77.9   20
# 5     5        98.8   15
# 6     6       100.7   20  ## !

We would add the rowSums of the ranks of the value columns and add the rank of the "tie breaker" size and finally subset the levels on the maxima.

rankv <- Vectorize(rank)
R <- rankv(df[-1])  ## or `apply(df[-1], 2, rank)`
r <- rowSums(R) + rank(R[,2])
df$label[r == max(r)]
# [1] 3 6

Data:

df <- structure(list(label = c(1, 2, 3, 4, 5, 6), measurement = c(100.5, 
84.7, 100.7, 77.9, 98.8, 100.7), size = c(20, 19, 20, 20, 15, 
20)), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 0

s_baldur
s_baldur

Reputation: 33498

In one line in base R:

df[order(df$size, df$measurement, decreasing = TRUE)[1], "label"]
# [1] 3

Upvotes: 1

user10917479
user10917479

Reputation:

The filter() function from dplyr will operate sequentially when specified as comma separated arguments. You can do something like this to filter to the observation and then pull() the label.

This method will give you multiple responses if the tiebreaker is also tied.

library(dplyr)

df %>% 
  filter(size == max(size), measurement == max(measurement)) %>% 
  pull(label)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

You can order the dataframe in descending order based on size and measurement column and then extract the first label from it.

df1 <- df[with(df, order(-size, -measurement)), ]
df1$label[1]
#[1] 3

Or sort them in ascending order and extract the last value. Using dplyr :

library(dplyr)

df %>%
  arrange(size, measurement) %>%
  pull(label) %>% last
#[1] 3

Upvotes: 1

Related Questions