fugu
fugu

Reputation: 6578

Change column value depending on other columns

I have a dataframe:

chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value
1    2L    13048   A   T            32            23           41.82         W           17            6          26.09        W       Germline    7.507123e-11       0.9437542
2    2L    16467   G   A             0            43          100.00         A            0           24         100.00        A           <NA>    6.674261e-40       1.0000000
3    2L    20682   T   A            32            14           30.43         W           14            6          30.00        W       Germline    1.746726e-07       0.6223244
4    2L    25727   T   G            52            22           29.73         K           16            4          20.00        K       Germline    2.000049e-09       0.8758070
5    2L    25729   A   T            49            23           31.94         W           16            4          20.00        W       Germline    7.938282e-10       0.9092970
6    2L    25741   T   C            45            28           38.36         Y           15            6          28.57        Y       Germline    1.497796e-12       0.8604958

I'm trying to change to value of the somatic_status col to "ROH" if both normal_var_freq and tumor_var_freq are > 90

Here's what I've tried:

snps <- within(snps, somatic_status[normal_var_freq > 90 & tumor_var_freq > 90] <- 'ROH')

but I get the error:

Warning message:
In `[<-.factor`(`*tmp*`, normal_var_freq > 90 & tumor_var_freq >  :
  invalid factor level, NA generated

Can someone point me in the right direction?

Upvotes: 0

Views: 56

Answers (1)

akrun
akrun

Reputation: 887951

We can the factor to character class before assigning the values to 'ROH' based on the logical vector ('i1')

i1 <- with(snps, normal_var_freq > 90 & tumor_var_freq > 90)
snps$somatic_status <- as.character(snps$somatic_status)
snps$somatic_status[i1] <- "ROH"

or add a new level to the column if we don't want to convert the factor column to character before changing some of the elements to a new value

levels(snps$somatic_status) <- c(levels(snps$somatic_status), "ROH")
snps$somatic_status[i1] <- "ROH"

Regarding the usage of within, it is a useful function for creating new variables or updates old variables, but the assigning a subset of values to new value is not recommended

Upvotes: 1

Related Questions