Murph
Murph

Reputation: 33

Errors with the align_local function in R

I am trying to compare two gene sequences:

sequence_1 <- "MPHLENVVLCRESQVSILQSLFGERHHFSFPSIFIYGHTASGKTYVTQTLLKTLELPHVFVNCVECFTLRLLLEQILNKLNHLSSSEDGCSTEITCETFNDFVRLFKQVTTAENLKDQTVYIVLDKAEYLRDMEANLLPGFLRLQELADRNVTVLFLSEIVWEKFRPNTGCFEPFVLYFPDYSIGNLQKILSHDHPPEYSADFYAAYINILLGVFYTVCRDLKELRHLAVLNFPKYCEPVVKGEASERDTRKLWRNIEPHLKKAMQTVYLREISSSQWEKLQKDDTDPGQLKGLSAHTHVELPYYSKFILIAAYLASYNPARTDKRFFLKHHGKIKKTNFLKKHEKTSNHLLGPKPFPLDRLLAILYSIVDSRVAPTANIFSQITSLVTLQLLTLVGHDDQLDGPKYKCTVSLDFIRAIARTVNFDIIKYLYDFL"

sequence_2 <- "MEEEAPRFNVLEEAFNGNGNGCANVEATQSAILKVLTRVNRFQMRVRKHIEDNYTEFLPNNTSPDIFLEESGSLNREIHDMLENLGSEGLDALDEANVKMAGNGRQLREILLGLGVSEHVLRIDELFQCVEEAKATKDYLVLLDLVGRLRAFIYGDDSVDGDAQVATPEVRRIFKALECYETIKVKYHVQAYMLQQSLQERFDRLVQLQCKSFPTSRCVTLQVSRDQTQLQDIVQALFQEPYNPARLCEFLLDNCIEPVIMRPVMADYSEEADGGTYVRLSLSYATKEPSSAHVRPNYKQVLENLRLLLHTLAGINCSVSRDQHVFGIIGDHVKDKMLKLLVDECLIPAVPESTEEYQTSTLCEDVAQLEQLLVDSFIINPEQDRALGQFVEKYETYYRNRMYRRVLETAREIIQRDLQDMVLVAPNNHSAEVANDPFLFPRCMISKSAQDFVKLMDRILRQPTDKLGDQEADPIAGVISIMLHTYINEVPKVHRKLLESIPQQAVLFHNNCMFFTHWVAQHANKGIESLAALAKTLQATGQQHFRVQVDYQSSILMGIMQEFEFESTHTLGSGPLKLVRQCLRQLELLKNVWANVLPETVYNATFCELINTFVAELIRRVFTLRDISAQMACELSDLIDVVLQRAPTLFREPNEVVQVLSWLKLQQLKAMLNASLMEITELWGDGVGPLTASYKSDEIKHLIRALFQDTDWRAKAITQIV"

using the align_local function from the textreuse package. My input is:

library(textreuse)
align_local(sequence_1, sequence_2)

and I get the error:

Error in b_out[out_i] <- b_orig[row_i - 1] : replacement has length zero
In addition: Warning message:
Multiple optimal local alignments found; selecting only one of them. 

I've tried tinkering with the alignment score and the mismatch score, but to no avail. Any advice would be appreciated.

Upvotes: 3

Views: 459

Answers (2)

Lincoln Mullen
Lincoln Mullen

Reputation: 6455

The textreuse package is intended for natural language. Under no circumstances should you use it for aligning gene sequences. (I am the package author.) You probably want the Biostrings package from Bioconductor.

The problem is that the align_local() function expects there to be multiple words, as indicated by spaces or punctuation, because it aligns word by word not character by character. The function would work if you put spaces between the bases in your gene sequence. But I'm not going to explain how to do that because, again, you should not be using a natural language package for aligning genes.

Upvotes: 4

Marija T.
Marija T.

Reputation: 53

The problem here is that lsh_compare function from textreuse package is meant for analyzing text documents and detecting passages which have been re-used. Which means that it works with spaced out words in a sentence.

My suggestion would be to try to find a package that is more suitable for handling genes.

e.g. dotPlot function from seqinr gives you a visual representation of the comparison.

Upvotes: 1

Related Questions