DWFotos
DWFotos

Reputation: 33

In R, how can I transform values in a data frame according to their rank in that row?

Let's say I have a data frame in R that contains a column with sample names and three additional columns of different treatments that those samples received. The values in the treatment columns correspond to days that the treatment was given or have NA if they did not receive that treatment. How can I transform each row so that the numeric values are replaced by the order that the treatment was received. In other words, the column with the lowest value/day gets mutated into "first_treatment", the second lowest value into "second_treatment" and so on.

Example Input Data:

df = data.frame(samples = LETTERS[1:5],
            Treatment_A = c(NA, "3", NA, "10", "12"),
            Treatment_B = c("12", NA, NA, "15", "5"),
            Treatment_C = c(NA, NA, "5", "8", NA))

Desired Output:

df_output = data.frame(samples = LETTERS[1:5],
            Treatment_A = c(NA, "First", NA, "Second","Second"),
            Treatment_B = c("First", NA, NA, "Third","First"),
            Treatment_C = c(NA, NA, "First", "First", NA))

*edit: fixed what the output should look like.

Upvotes: 1

Views: 91

Answers (4)

Andre Wildberg
Andre Wildberg

Reputation: 19191

An approach using cut for grouping

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-samples) %>% 
  mutate(value = cut(as.numeric(value), 
                   breaks = seq(min(as.numeric(value), na.rm=T), 
                                max(as.numeric(value), na.rm=T), 
                                length.out=4), 
                   include.lowest=T, 
                   labels=c("First", "Second", "Third"))) %>% 
  pivot_wider()
# A tibble: 5 × 4
  samples Treatment_A Treatment_B Treatment_C
  <chr>   <fct>       <fct>       <fct>      
1 A       NA          Third       NA         
2 B       First       NA          NA         
3 C       NA          NA          First      
4 D       Second      Third       Second     
5 E       Third       First       NA

Upvotes: 0

jay.sf
jay.sf

Reputation: 73572

We can create a small assignment vector, calculate rowRanks and match them with it.

> v <- c(first=1, second=2, third=3)
> m <- matrixStats::rowRanks(as.matrix(type.convert(df[-1], as.is=TRUE)))
> m[] <- names(v)[match(m, v)]
> data.frame(df['samples'], m)
     samples Treatment_A Treatment_B Treatment_C
1          A        <NA>       first        <NA>
2          B       first        <NA>        <NA>
3          C        <NA>        <NA>       first
4          D      second       third       first
5          E      second       first        <NA>

Upvotes: 0

Carl
Carl

Reputation: 7540

A possible option using label_ordinal() if you don't mind an automated "1st", "2nd" etc. as an alternative to "first", "second" etc.

row_number can be switched dense_rank or min_rank depending on the ties method preference.

library(dplyr)
library(scales)

df |>
  mutate(
    across(-samples, \(x) row_number(as.numeric(x))),
    across(-samples, label_ordinal())
  )
#>   samples Treatment_A Treatment_B Treatment_C
#> 1       A        <NA>         2nd        <NA>
#> 2       B         1st        <NA>        <NA>
#> 3       C        <NA>        <NA>         1st
#> 4       D         2nd         3rd         2nd
#> 5       E         3rd         1st        <NA>

Created on 2024-05-07 with reprex v2.1.0

Upvotes: 0

SamR
SamR

Reputation: 20494

The basic logic is to rank() the columns. In this case we want the "keep" option for the na.last parameter:

a logical or character string controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if "keep" they are kept with rank NA.

As you want to use "First" rather than 1 etc. we can create an ordered character vector of ranks and then subset this by rank.

# This could be longer...
ranks <- c("First", "Second", "Third", "Fourth", "Fifth")

base R approach

# Your input is character rather than numeric
df <- type.convert(df, as.is = TRUE) 

cols <- grep("^Treatment", names(df), value = TRUE)
df[cols] <- lapply(df[cols], \(x) ranks[rank(x, na.last = "keep")])

df
#   samples Treatment_A Treatment_B Treatment_C
# 1       A        <NA>      Second        <NA>
# 2       B       First        <NA>        <NA>
# 3       C        <NA>        <NA>       First
# 4       D      Second       Third      Second
# 5       E       Third       First        <NA>

identical(df, df_output)
# [1] TRUE

tidyverse approach

The same basic approach can be taken using dplyr and readr:

library(dplyr)
df |>
    readr::type_convert() |>
    mutate(
        across(
            Treatment_A:Treatment_C,
            \(x) ranks[rank(x, na.last = "keep")]
        )
    )

#   samples Treatment_A Treatment_B Treatment_C
# 1       A        <NA>      Second        <NA>
# 2       B       First        <NA>        <NA>
# 3       C        <NA>        <NA>       First
# 4       D      Second       Third      Second
# 5       E       Third       First        <NA>

Upvotes: 1

Related Questions