Hsn SA
Hsn SA

Reputation: 1

Extract user specified sequence from reverse strand of from FASTA file Using samtools

I have a list of regions with start and end points.

I used the samtools faidx ref.fa <region> command. This command gave me the forward strand sequence for that region.

In the samtools manual there is an option to extract reverse strand but I could not figure out how to use that.

Does anybody know how to run this command for reverse strand in samtools?

My regions are like:

 LG2:124522-124572 (Forward)
 LG3:250022-250072 (Reverse)
 LG29:4822278-4822318 (Reverse)
 LG12:2,595,915-2,596,240 (Forward)
 LG16:5,405,500-5,405,828 (Reverse)

Upvotes: 0

Views: 904

Answers (2)

JeffZheng
JeffZheng

Reputation: 1405

just want to mention that you probably need to update your samtools to the latest version if you met problem. In my case, samtools V1.2 didn't work, and V1.10 worked.

Upvotes: 0

Richard Michael
Richard Michael

Reputation: 1633

As you noticed, samtools has the option --reverse-complement (or -i) to output the sequence from the reverse strand.

As far as I know, samtools does not support a region notation which permits specifying the strand.

A quick solution would be to separate your region file into forward and reverse locations and run samtools twice.

The steps below are rather verbose, just so the steps are clear. It's fairly straight-forward to clean this up with process substitution in bash, for example.

# Separate the strand regions.

# Use grep and sed twice, or awk (below).
grep -F '(Forward)' regions.txt | sed 's/ (Forward)//' > forward-regions.txt
grep -F '(Reverse)' regions.txt | sed 's/ (Reverse)//' > reverse-regions.txt

# Above as an awk one-liner.
awk '{ strand=($2 == "(Forward)") ? "forward" : "reverse"; print $1 > strand"-regions.txt" }' regions.txt

# Run samtools, marking the strand as +/- in the FASTA output.
samtools faidx ref.fa -r forward-regions.txt --mark-strand sign -o forward-sequences.fa 
samtools faidx ref.fa -r reverse-regions.txt --mark-strand sign -o reverse-sequences.fa --reverse-complement

# Combine the FASTA output to a single file.
cat forward-sequences.fa reverse-sequences.fa > sequences.fa
rm forward-sequences.fa reverse-sequences.fa

Upvotes: 1

Related Questions