neversaint
neversaint

Reputation: 64014

Extracting substring given a position of reference

Given a string, point of reference position and some length:

my $string = "AAAAAAAATGAAAAAAAA";
my $ref_pos = 10;
my $length = 5;

I'd like to extract the substring of length +/-5bp from the reference position, yielding: AAAATGAAAAA

In the example above the ref_pos will correspond to G then we extract +/-5bp from that G.

And if the length exceeds the substring length, we report all the bases. For example:

my $string2 = "AAAAAAAATGCCC";
my $ref_pos = 10;
my $length = 5;

Will yield: AAAATGCCC

What's the way to do it in Perl?

Upvotes: 0

Views: 148

Answers (1)

Jim Garrison
Jim Garrison

Reputation: 86774

I think your question is really about how to determine the start/end positions when the length can overlap the start or end of the string. Here's one way in pseudocode:

str = string
p   = desired offset
len = desired length

start = max(0,p-(len/2))
end   = min(str.length, max(start+len, p+(len/2)))

The start position should be the desired offset minus 1/2 the desired length, but it can never be less than zero. Once you've fixed the start position, calculate the end as either (the desired offset plus half the desired length) or (the start plus the desired length), whichever is larger. Finally, limit the end to never be past the end of the string.

Note that end is one character beyond the last character of the result.

Dealing with an odd "desired length" is left as an exercise.

Upvotes: 1

Related Questions