Abhishek
Abhishek

Reputation: 437

How to create new 3 columns for top 3 highest value calculated row-wise?

How can we create columns with highest values for each row ?

References:

https://rdrr.io/cran/dplyr/man/top_n.html

Selecting top N values within a group in a column using R

For e.g.

library(tidyverse)

iris %>% glimpse()

# my attempt
x = iris %>% 
  select(-Species) %>%
  gather(measure,values) %>%
# hereafter got stuck
  mutate(top_1 =
                  top_2 = 
                  top3_3 = )

# expected_output contains same number of rows as input
expected_output = iris %>% mutate(top_1 = 1st highest value from the row  (row wise),
                                  top_2 = 2nd highest value from the row  (row wise),
                                  top_3 = 3rd highest value from the row (row wise))



# expected output first 3 rows looks like below:
iris[1:3,] %>% 
mutate(top_1 = c(5.1,4.9,4.7), top_2 = c(3.5,3.0,3.2), top_3 = c(1.4,1.4,1.3))

Upvotes: 1

Views: 191

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

We can use apply row-wise, sort the vector in decreasing order and get top 3 values using head

df <- iris 
df[paste0("top_", 1:3)] <- t(apply(df[-5], 1, function(x) 
                             head(sort(x, decreasing = TRUE), 3)))

head(df)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species top_1 top_2 top_3
#1          5.1         3.5          1.4         0.2  setosa   5.1   3.5   1.4
#2          4.9         3.0          1.4         0.2  setosa   4.9   3.0   1.4
#3          4.7         3.2          1.3         0.2  setosa   4.7   3.2   1.3
#4          4.6         3.1          1.5         0.2  setosa   4.6   3.1   1.5
#5          5.0         3.6          1.4         0.2  setosa   5.0   3.6   1.4
#6          5.4         3.9          1.7         0.4  setosa   5.4   3.9   1.7

A tidyverse alternative which involves some reshaping

library(dplyr)
library(tidyr)

iris %>%
  mutate(row = row_number()) %>%
  select(-Species) %>%
  gather(key, value, -row) %>%
  group_by(row) %>%
  top_n(3, value) %>%
  mutate(key = paste0("top", 1:3)) %>%
  spread(key, value) %>%
  ungroup %>%
  select(-row)

Upvotes: 3

Related Questions