KLM117
KLM117

Reputation: 467

Creating a column with factor variables conditional on multiple other columns?

I have 4 columns, called Amplification, CNV.gain, Homozygous.Deletion.Frequency, Heterozygous.Deletion.Frequency. I want to create a new column in which, if any of the values in these 4 columns are:

An example of the final table (long_fused) would look like this:

CNV.Gain Amplification Homozygous.Deletion.Frequency Heterozygous.Deletion.Frequency Threshold
3 5 10 0 Low
0 0 11 8 Medium
7 16 25 0 High

So far, I've tried the following code, although it seems to fill in the "Threshold" Column, is doing so incorrectly.

library(dplyr)
long_fused <- long_fused %>%
  mutate(Percent_sample_altered = case_when(
    Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
    Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium', 
    Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))

As always any help is appreciated!


Data in dput format

long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L, 
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L), 
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold = 
c("Low", "Medium", "High")), class = "data.frame", 
row.names = c(NA, -3L))

Upvotes: 2

Views: 502

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Here's an alternative using case_when -

library(dplyr)

long_fused %>%
  mutate(max = do.call(pmax, select(., -Threshold)),
  #If you don't have Threshold column in your data just use .
  #mutate(max = do.call(pmax, .),  
         Threshold = case_when(between(max, 5, 10) ~ 'Low', 
                               between(max, 11, 15) ~ 'Medium', 
                               TRUE ~ 'High'))

#  CNV.Gain Amplification Homozygous.Deletion.Frequency
#1        3             5                            10
#2        0             0                            11
#3        7            16                            25

#  Heterozygous.Deletion.Frequency max Threshold
#1                               0  10       Low
#2                               8  11    Medium
#3                               0  25      High

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a way with rowwise followed by base function cut.

library(dplyr)

long_fused %>%
  rowwise() %>%
  mutate(new = max(c_across(-Threshold)),
         new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))

Upvotes: 3

Related Questions