user2498657
user2498657

Reputation: 377

Combining sequences with similar gene IDs

I have a list of gene IDs along with their sequences in R.

$2435
[1]"ATGCGGGCGGGGGTCGTCGA"

$2435
[1]"ATGCGGCGCGCGCGCTATATACGC"

$2435
[1]"ATGCGGCGCCTCTCATCGCGGGGG"

I want to combine the sequences with the same gene IDs in that list in R.

$2435
[1]"ATGCGGGCGGGGGTCGTCGAATGCGGCGCGCGCGCTATATACGCATGCGGCGCCTCTCATCGCGGGGG"

Upvotes: 2

Views: 88

Answers (3)

Ferdinand.kraft
Ferdinand.kraft

Reputation: 12819

Bonus:

For a dataframe output, use this:

aggregate(unlist(A), by=list(id=names(A)), paste, collapse="")

Where A is you list.

Using @Ananda's A, I get this:

  id                                       x
1 10                        FFFFGGGGHHHHIIII
2 12 AAAABBBBCCCCDDDDXXXXXXXXXXXXXXXXXXXXXXX
3 34                                    GGGG

Upvotes: 2

Julius Vainora
Julius Vainora

Reputation: 48211

l <- list("A" = "ABC", "B" = "XYX", "A" = "DEF", "C" = "YZY", "A" = "GHI")
tapply(l, names(l), paste, collapse = "", simplify = FALSE)
# $A
# [1] "ABCDEFGHI"
# 
# $B
# [1] "XYX"
# 
# $C
# [1] "YZY"

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Use lapply after matching the names with unique. Here's some sample data:

A <- list("12" = "AAAABBBBCCCCDDDD",
          "34" = "GGGG",
          "12" = "XXXXXXXXXXXXXXXXXXXXXXX",
          "10" = "FFFFGGGG",
          "10" = "HHHHIIII")
A
# $`12`
# [1] "AAAABBBBCCCCDDDD"
# 
# $`34`
# [1] "GGGG"
# 
# $`12`
# [1] "XXXXXXXXXXXXXXXXXXXXXXX"
# 
# $`10`
# [1] "FFFFGGGG"
# 
# $`10`
# [1] "HHHHIIII"

Subset the related names and paste them together.

lapply(unique(names(A)), function(x) paste(A[names(A) %in% x], collapse = ""))
# [[1]]
# [1] "AAAABBBBCCCCDDDDXXXXXXXXXXXXXXXXXXXXXXX"
# 
# [[2]]
# [1] "GGGG"
# 
# [[3]]
# [1] "FFFFGGGGHHHHIIII"

Upvotes: 2

Related Questions