jonny jeep
jonny jeep

Reputation: 417

group rows in table

I have a table called es_table, composed by 2 columns, $VAr1 and $Freq In the first column I have the name of elements, but from the row 13 to 51 I have elements that belong to Simple repeats:

es_table
                Var1 Freq
1   _L1_Arabidopsis1   39
2   _L1_Arabidopsis2    1
3   _L1_Arabidopsis3    2
4      _RTE_Anolis10  100
5       _RTE_Anolis2   14
6       _RTE_Anolis3    5
7       _RTE_Anolis4   19
8       _RTE_Anolis5    6
9       _RTE_Anolis6    1
10      _RTE_Anolis7   14
11      _RTE_Anolis8    7
12      _RTE_Anolis9    6
13              (A)n    1
14             (AA)n    8
15            (AAA)n   11
16           (AAAA)n   11
17          (AAAAA)n    4
18         (AAAAAA)n    1
19        (AAAAAAT)n    1
20         (AAAAAC)n    2
21        (AAAAACA)n    1
22         (AAAAAG)n    1
23        (AAAAAGA)n    1
24        (AAAAAGG)n    1
25        (AAAAAGT)n    1
26         (AAAAAT)n    3
27          (AAAAC)n    3
28         (AAAACA)n    1
29          (AAAAG)n    2
30         (AAAAGA)n    2
31        (AAAAGAA)n    1
32        (AAAAGAG)n    1
33          (AAAAT)n    5
34         (AAAATA)n    4
35        (GCTATAA)n    1
36        (TTTTTTT)n    1
37           (AAAC)n    1
38         (AAACAA)n    1
39           (AAAG)n   21
40          (AAAGA)n    3
41         (AAAGAA)n    4
42        (CCAGAAA)n    2
43         (AAAGAG)n    9
44          (AAAGG)n    1
45           (TCGA)n   11
46          (AAATA)n    3
47         (AAATAA)n    2
48        (CCCTAAA)n    3
49        (GTGTAAT)n    1
50      (AGTAGATAT)n    3
51        (AAATATA)n    1
52          Tx1-5_FR   16
53          U2snRNA1    1
54 VENSMAR1_Mariner     7
55 VENSMAR1_Mariner/    5
56 VENSMAR1_Mariner     7
57            ZhAT5_ZM  3
> 

My objective is to group all the Simple repeats by a common classification name in the way to recognize them more simply. For example, I would to obtain this:

es_table
                Var1 Freq
1   _L1_Arabidopsis1   39
2   _L1_Arabidopsis2    1
3   _L1_Arabidopsis3    2
4      _RTE_Anolis10  100
5       _RTE_Anolis2   14
6       _RTE_Anolis3    5
7       _RTE_Anolis4   19
8       _RTE_Anolis5    6
9       _RTE_Anolis6    1
10      _RTE_Anolis7   14
11      _RTE_Anolis8    7
12      _RTE_Anolis9    6
13              Simple_rep(A)n    1
14             Simple_rep(AA)n    8
15            Simple_rep(AAA)n   11
16           Simple_rep(AAAA)n   11
17          Simple_rep(AAAAA)n    4
18         Simple_rep(AAAAAA)n    1
19        Simple_rep(AAAAAAT)n    1
20         Simple_rep(AAAAAC)n    2
21        Simple_rep(AAAAACA)n    1
22         Simple_rep(AAAAAG)n    1
23        Simple_rep(AAAAAGA)n    1
24        Simple_rep(AAAAAGG)n    1
25        Simple_rep(AAAAAGT)n    1
26         Simple_rep(AAAAAT)n    3
27          Simple_rep(AAAAC)n    3
28         Simple_rep(AAAACA)n    1
29          Simple_rep(AAAAG)n    2
30         Simple_rep(AAAAGA)n    2
31        Simple_rep(AAAAGAA)n    1
32        Simple_rep(AAAAGAG)n    1
33          Simple_rep(AAAAT)n    5
34         Simple_rep(AAAATA)n    4
35        Simple_rep(GCTATAA)n    1
36        Simple_rep(TTTTTTT)n    1
37           Simple_rep(AAAC)n    1
38         Simple_rep(AAACAA)n    1
39           Simple_rep(AAAG)n   21
40          Simple_rep(AAAGA)n    3
41         Simple_rep(AAAGAA)n    4
42        Simple_rep(CCAGAAA)n    2
43         Simple_rep(AAAGAG)n    9
44          Simple_rep(AAAGG)n    1
45           Simple_rep(TCGA)n   11
46          Simple_rep(AAATA)n    3
47         Simple_rep(AAATAA)n    2
48        Simple_rep(CCCTAAA)n    3
49        Simple_rep(GTGTAAT)n    1
50      Simple_rep(AGTAGATAT)n    3
51        Simple_rep(AAATATA)n    1
52          Tx1-5_FR   16
53          U2snRNA1    1
54 VENSMAR1_Mariner     7
55 VENSMAR1_Mariner/    5
56 VENSMAR1_Mariner     7
57            ZhAT5_ZM  3
> 

Could be a code to obtain this? thanks regards

Upvotes: 1

Views: 56

Answers (2)

akrun
akrun

Reputation: 887981

If the duplicates are already identified, we can use paste to concatenate the prefix substring on the required elements

es_table$Table[13:51] <- paste0("Simple_rep", es_table$Table[13:51] )

If the pattern to identify is (, we can use

library(dplyr)
library(stringr)
es_table %>%
     mutate(Table = case_when(str_detect(Table, "[(]") ~ 
                 str_c("Simple_rep", Table), TRUE ~ Table))

Upvotes: 0

Sotos
Sotos

Reputation: 51622

You can search for the parenthesis and paste on those, i.e.

df$Var1[grepl('\\(', df$Var1)] <- paste0('simple_rep', df$Var1[grepl('\\(', df$Var1)])

Upvotes: 2

Related Questions