Reputation: 11
I would like to kindly ask for the help of the community in reshaping a text file. The text file looks like this:
TRINITY_GG_17866_c6_g1_i1
TRINITY_GG_17866_c3_g1_i1
TRINITY_GG_17866_c1_g1_i7
GO:0000226
GO:0006139
GO:0006259
TRINITY_GG_17866_c5_g1_i1
GO:0003674
GO:0005488
What I would like to get in the end is like this (separated by tab)
TRINITY_GG_17866_c1_g1_i7 GO:0000226
TRINITY_GG_17866_c1_g1_i7 GO:0006139
TRINITY_GG_17866_c1_g1_i7 GO:0006259
TRINITY_GG_17866_c5_g1_i1 GO:0003674
TRINITY_GG_17866_c5_g1_i1 GO:0005488
I could not come up with any solutions so far on how to do this. I would really appreciate any advice on this issue.
Best wishes, Ferenc
Upvotes: 1
Views: 26
Reputation: 39858
One dplyr
option could be:
df %>%
group_by(grp = cumsum(!startsWith(V1, "GO:"))) %>%
filter(n() > 1) %>%
mutate(V2 = lead(V1),
V1 = first(V1)) %>%
na.omit() %>%
ungroup() %>%
select(-grp)
V1 V2
<chr> <chr>
1 TRINITY_GG_17866_c1_g1_i7 GO:0000226
2 TRINITY_GG_17866_c1_g1_i7 GO:0006139
3 TRINITY_GG_17866_c1_g1_i7 GO:0006259
4 TRINITY_GG_17866_c5_g1_i1 GO:0003674
5 TRINITY_GG_17866_c5_g1_i1 GO:0005488
Or as one column:
df %>%
group_by(grp = cumsum(!startsWith(V1, "GO:"))) %>%
filter(n() > 1) %>%
mutate(V2 = lead(V1),
V1 = first(V1)) %>%
na.omit() %>%
ungroup() %>%
select(-grp) %>%
transmute(V1 = paste(V1, V2))
V1
<chr>
1 TRINITY_GG_17866_c1_g1_i7 GO:0000226
2 TRINITY_GG_17866_c1_g1_i7 GO:0006139
3 TRINITY_GG_17866_c1_g1_i7 GO:0006259
4 TRINITY_GG_17866_c5_g1_i1 GO:0003674
5 TRINITY_GG_17866_c5_g1_i1 GO:0005488
Sample data:
df <- read.table(text = "TRINITY_GG_17866_c6_g1_i1
TRINITY_GG_17866_c3_g1_i1
TRINITY_GG_17866_c1_g1_i7
GO:0000226
GO:0006139
GO:0006259
TRINITY_GG_17866_c5_g1_i1
GO:0003674
GO:0005488",
header = FALSE,
stringsAsFactors = FALSE)
Upvotes: 1