Reputation: 480
I am trying to reverse engineer a Perl script. One of the lines contains a matching operator that reads:
$line =~ /^\s*^>/
The input is just FASTA sequences with header information. The script is looking for a particular pattern in the header, I believe.
Here is an example of the files the script is applied to:
>mm9_refGene_NM_001252200_0 range=chr1:39958075-39958131 5'pad=0 3'pad=0 strand=+
repeatMasking=none
ATGGCGAACGACTCTCCCGCGAAGAGCCTGGTGGACATTGACCTGTCGTC
CCTGCGG
>mm9_refGene_NM_001252200_1 range=chr1:39958354-39958419 5'pad=0 3'pad=0 strand=+
repeatMasking=none
GACCCTGCTGGGATTTTTGAGCTGGTGGAAGTGGTTGGAAATGGCACCTA
TGGACAAGTCTATAAG
This is a matching operator asking whether the line, from its beginning, contains white spaces of at least more than zero, but then I lose its meaning.
This is how I have parsed the regex so far:
from beginning [ (/^... ], contains white spaces [ ...\s... ] of at least more than zero [ ...*... }.
Upvotes: 2
Views: 151
Reputation: 1304
It is much easier to reverse engineer perl script with debugger. "perl -d script.pl" or if you have Linux ddd: "ddd cript.pl &".
For multiline regex this regex match for emptyline with spaces and begin of the next FASTA. http://www.rexfiddle.net/c6locQg
Upvotes: 2
Reputation: 3863
Using RegexBuddy (or, as r3mus said, regex101.com, which is free):
Assert position at the beginning of the string «^»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the beginning of the string «^»
Match the character “>” literally «>»
EDIT: Birei's answer is probably more correct if the regex in question is actually wrong.
Upvotes: 2
Reputation: 36262
You have to get rid of the second ^
character. It is a metacharacter and means the beginning of a line (without special flags like /m
), but that meaning it's already achieved with the first one.
The character >
will match at the beginning of the line without the second ^
because the initial whitespace is optional (*
quantifier). So, use:
$line =~ /^\s*>/
Upvotes: 2