NIA
NIA

Reputation: 11

Deriving data for a column from another column

I have a data-frame with three columns - gene_name, expression values, copy-number and over 10,000 rows of data. The copy-number is blank and the data needs to be derived from the expression values column using following range:

Absolute values.   copy-number
0-0.5                 1
0.5- 1.5             2
1.5- 2.5              3
2.5-3.5               4

Any suggestions how can I write a Rscript for the above?

Thank you for your suggestions.

Upvotes: 0

Views: 52

Answers (2)

TarJae
TarJae

Reputation: 78917

Data from stefan (thank you for this). Alternatively we could use case_when

library(dplyr)
dd %>% 
  mutate(copy_number = case_when(expression_values >= 0 & expression_values < 0.5 ~ 1,
                                 expression_values >= 0.5 & expression_values < 1.5 ~ 2,
                                 expression_values >= 1.5 & expression_values < 2.5 ~ 3,
                                 expression_values >= 2.5 & expression_values <= 3.5 ~ 4))

Output:

gene_name expression_values copy_number
1          A         0.8940009           2
2          A         1.6180249           3
3          A         3.2900508           4
4          A         3.4237925           4
5          B         0.4112058           1
6          B         1.6624898           3
7          B         1.9611646           3
8          A         3.1641099           4
9          C         0.4854856           1
10         C         3.4611211           4

Upvotes: 0

stefan
stefan

Reputation: 123893

You could use cut:

set.seed(42)

dd <- data.frame(
  gene_name = sample(LETTERS[1:3], 10, replace = TRUE),
  expression_values = runif(10, 0, 3.5),
  copy_number = NA
)
dd$copy_number <- cut(dd$expression_values, 
                      breaks = c(0, .5, 1.5, 2.5, 3.5), 
                      labels = 1:4,
                      include.lowest = TRUE)
dd
#>    gene_name expression_values copy_number
#> 1          A         0.8940009           2
#> 2          A         1.6180249           3
#> 3          A         3.2900508           4
#> 4          A         3.4237925           4
#> 5          B         0.4112058           1
#> 6          B         1.6624898           3
#> 7          B         1.9611646           3
#> 8          A         3.1641099           4
#> 9          C         0.4854856           1
#> 10         C         3.4611211           4

Upvotes: 1

Related Questions