Reputation: 11
I have a data-frame with three columns - gene_name, expression values, copy-number and over 10,000 rows of data. The copy-number is blank and the data needs to be derived from the expression values column using following range:
Absolute values. copy-number
0-0.5 1
0.5- 1.5 2
1.5- 2.5 3
2.5-3.5 4
Any suggestions how can I write a Rscript for the above?
Thank you for your suggestions.
Upvotes: 0
Views: 52
Reputation: 78917
Data from stefan (thank you for this).
Alternatively we could use case_when
library(dplyr)
dd %>%
mutate(copy_number = case_when(expression_values >= 0 & expression_values < 0.5 ~ 1,
expression_values >= 0.5 & expression_values < 1.5 ~ 2,
expression_values >= 1.5 & expression_values < 2.5 ~ 3,
expression_values >= 2.5 & expression_values <= 3.5 ~ 4))
Output:
gene_name expression_values copy_number
1 A 0.8940009 2
2 A 1.6180249 3
3 A 3.2900508 4
4 A 3.4237925 4
5 B 0.4112058 1
6 B 1.6624898 3
7 B 1.9611646 3
8 A 3.1641099 4
9 C 0.4854856 1
10 C 3.4611211 4
Upvotes: 0
Reputation: 123893
You could use cut
:
set.seed(42)
dd <- data.frame(
gene_name = sample(LETTERS[1:3], 10, replace = TRUE),
expression_values = runif(10, 0, 3.5),
copy_number = NA
)
dd$copy_number <- cut(dd$expression_values,
breaks = c(0, .5, 1.5, 2.5, 3.5),
labels = 1:4,
include.lowest = TRUE)
dd
#> gene_name expression_values copy_number
#> 1 A 0.8940009 2
#> 2 A 1.6180249 3
#> 3 A 3.2900508 4
#> 4 A 3.4237925 4
#> 5 B 0.4112058 1
#> 6 B 1.6624898 3
#> 7 B 1.9611646 3
#> 8 A 3.1641099 4
#> 9 C 0.4854856 1
#> 10 C 3.4611211 4
Upvotes: 1