Reputation: 417
I have a table called es_table
, composed by 2 columns, $VAr1
and $Freq
In the first column I have the name of elements, but from the row 13 to 51 I have elements that belong to Simple repeats:
es_table
Var1 Freq
1 _L1_Arabidopsis1 39
2 _L1_Arabidopsis2 1
3 _L1_Arabidopsis3 2
4 _RTE_Anolis10 100
5 _RTE_Anolis2 14
6 _RTE_Anolis3 5
7 _RTE_Anolis4 19
8 _RTE_Anolis5 6
9 _RTE_Anolis6 1
10 _RTE_Anolis7 14
11 _RTE_Anolis8 7
12 _RTE_Anolis9 6
13 (A)n 1
14 (AA)n 8
15 (AAA)n 11
16 (AAAA)n 11
17 (AAAAA)n 4
18 (AAAAAA)n 1
19 (AAAAAAT)n 1
20 (AAAAAC)n 2
21 (AAAAACA)n 1
22 (AAAAAG)n 1
23 (AAAAAGA)n 1
24 (AAAAAGG)n 1
25 (AAAAAGT)n 1
26 (AAAAAT)n 3
27 (AAAAC)n 3
28 (AAAACA)n 1
29 (AAAAG)n 2
30 (AAAAGA)n 2
31 (AAAAGAA)n 1
32 (AAAAGAG)n 1
33 (AAAAT)n 5
34 (AAAATA)n 4
35 (GCTATAA)n 1
36 (TTTTTTT)n 1
37 (AAAC)n 1
38 (AAACAA)n 1
39 (AAAG)n 21
40 (AAAGA)n 3
41 (AAAGAA)n 4
42 (CCAGAAA)n 2
43 (AAAGAG)n 9
44 (AAAGG)n 1
45 (TCGA)n 11
46 (AAATA)n 3
47 (AAATAA)n 2
48 (CCCTAAA)n 3
49 (GTGTAAT)n 1
50 (AGTAGATAT)n 3
51 (AAATATA)n 1
52 Tx1-5_FR 16
53 U2snRNA1 1
54 VENSMAR1_Mariner 7
55 VENSMAR1_Mariner/ 5
56 VENSMAR1_Mariner 7
57 ZhAT5_ZM 3
>
My objective is to group all the Simple repeats by a common classification name in the way to recognize them more simply. For example, I would to obtain this:
es_table
Var1 Freq
1 _L1_Arabidopsis1 39
2 _L1_Arabidopsis2 1
3 _L1_Arabidopsis3 2
4 _RTE_Anolis10 100
5 _RTE_Anolis2 14
6 _RTE_Anolis3 5
7 _RTE_Anolis4 19
8 _RTE_Anolis5 6
9 _RTE_Anolis6 1
10 _RTE_Anolis7 14
11 _RTE_Anolis8 7
12 _RTE_Anolis9 6
13 Simple_rep(A)n 1
14 Simple_rep(AA)n 8
15 Simple_rep(AAA)n 11
16 Simple_rep(AAAA)n 11
17 Simple_rep(AAAAA)n 4
18 Simple_rep(AAAAAA)n 1
19 Simple_rep(AAAAAAT)n 1
20 Simple_rep(AAAAAC)n 2
21 Simple_rep(AAAAACA)n 1
22 Simple_rep(AAAAAG)n 1
23 Simple_rep(AAAAAGA)n 1
24 Simple_rep(AAAAAGG)n 1
25 Simple_rep(AAAAAGT)n 1
26 Simple_rep(AAAAAT)n 3
27 Simple_rep(AAAAC)n 3
28 Simple_rep(AAAACA)n 1
29 Simple_rep(AAAAG)n 2
30 Simple_rep(AAAAGA)n 2
31 Simple_rep(AAAAGAA)n 1
32 Simple_rep(AAAAGAG)n 1
33 Simple_rep(AAAAT)n 5
34 Simple_rep(AAAATA)n 4
35 Simple_rep(GCTATAA)n 1
36 Simple_rep(TTTTTTT)n 1
37 Simple_rep(AAAC)n 1
38 Simple_rep(AAACAA)n 1
39 Simple_rep(AAAG)n 21
40 Simple_rep(AAAGA)n 3
41 Simple_rep(AAAGAA)n 4
42 Simple_rep(CCAGAAA)n 2
43 Simple_rep(AAAGAG)n 9
44 Simple_rep(AAAGG)n 1
45 Simple_rep(TCGA)n 11
46 Simple_rep(AAATA)n 3
47 Simple_rep(AAATAA)n 2
48 Simple_rep(CCCTAAA)n 3
49 Simple_rep(GTGTAAT)n 1
50 Simple_rep(AGTAGATAT)n 3
51 Simple_rep(AAATATA)n 1
52 Tx1-5_FR 16
53 U2snRNA1 1
54 VENSMAR1_Mariner 7
55 VENSMAR1_Mariner/ 5
56 VENSMAR1_Mariner 7
57 ZhAT5_ZM 3
>
Could be a code to obtain this? thanks regards
Upvotes: 1
Views: 56
Reputation: 887981
If the duplicates are already identified, we can use paste
to concatenate the prefix substring on the required elements
es_table$Table[13:51] <- paste0("Simple_rep", es_table$Table[13:51] )
If the pattern to identify is (
, we can use
library(dplyr)
library(stringr)
es_table %>%
mutate(Table = case_when(str_detect(Table, "[(]") ~
str_c("Simple_rep", Table), TRUE ~ Table))
Upvotes: 0
Reputation: 51622
You can search for the parenthesis and paste on those, i.e.
df$Var1[grepl('\\(', df$Var1)] <- paste0('simple_rep', df$Var1[grepl('\\(', df$Var1)])
Upvotes: 2