Reputation: 2279
I have a huge data frame(28987853 rows) of the form
head(ratRawData)
ratGene ratReplicate alignment RNAtype
1 C4b Thymus_M_GSM1328751 2 REG
2 Rpl4 Thymus_M_GSM1328751 4 REG
3 Dntt Thymus_M_GSM1328751 3 DUP
4 Sptbn1 Thymus_M_GSM1328751 2 DUP
5 Ndufb7 Thymus_M_GSM1328751 2 REG
6 Ndufb10 Thymus_M_GSM1328751 2 REV
Now, what I want to do is the change all the occurrence of DUP in RNAtype to REV. Since thyis data frame is quite big, I am wondering what's a good way of doing this. Thanks in advance!
Upvotes: 1
Views: 60
Reputation: 70653
I did some timings.
> set.seed(357)
> rat.raw.data <- data.frame(col1 = sample(letters, 28987853, replace = TRUE),
+ col2 = sample(1:10, 28987853, replace = TRUE),
+ col3 = sample(LETTERS, 28987853, replace = TRUE),
+ rna = sample(c("REG", "DUP", "REV"), 28987853, replace = TRUE))
>
>
> dusty <- rat.raw.data
> system.time({dusty$rna[dusty$rna == "DUP"] <- "REV"})
user system elapsed
3.37 0.24 3.64
>
> akrun <- rat.raw.data
> system.time({akrun$rna[grepl("DUP", akrun$rna)]<- "REV"})
user system elapsed
5.06 0.04 5.18
>
> roman <- rat.raw.data
> system.time({levels(roman$rna) <- c("REV", "REG", "REV")})
user system elapsed
1.08 0.13 1.20
> head(dusty)
col1 col2 col3 rna
1 c 3 P REV
2 b 7 B REG
3 h 6 T REV
4 f 3 H REV
5 q 6 F REG
6 m 9 F REV
> head(akrun)
col1 col2 col3 rna
1 c 3 P REV
2 b 7 B REG
3 h 6 T REV
4 f 3 H REV
5 q 6 F REG
6 m 9 F REV
> head(roman)
col1 col2 col3 rna
1 c 3 P REV
2 b 7 B REG
3 h 6 T REV
4 f 3 H REV
5 q 6 F REG
6 m 9 F REV
Upvotes: 3