Reputation: 2498
I'm trying to process an entire string but the way my code is written, part of it is not being processed. Here's a representation of my code:
#!/usr/bin/perl
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
$string =~ s/\s+//g; # remove white space from string
# split the string into fragments of 58 characters and store in array
my @array = $string =~ /[A-Z]{58}/g;
my $len = scalar @array;
print $len . "\n"; # this prints 3
# print the fragments
print $array[0] . "\n";
print $array[1] . "\n";
print $array[2] . "\n";
print $array[3] . "\n";
The code outputs the following:
3
MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD
PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF
VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL
<blank space>
Notice that the rest of the string EDAVTKPELRPCPTP
is not stored in @array
. When I'm creating my array, how do I store EDAVTKPELRPCPTP
? Perhaps I could store it in $array[3]
?
Upvotes: 0
Views: 113
Reputation: 126762
You may prefer to use unpack
, like this
$string =~ s/\s+//g;
my @fragments = unpack '(A58)*', $string;
Or if you would rather leave $string
unchanged and have v5.14 or better of Perl, then you can write
my @fragments = unpack '(A58)*', $string =~ s/\s+//gr;
Upvotes: 2
Reputation: 6553
If you don't actually need regex character classes, this is how I'd do it:
use strict;
use warnings;
use Data::Dump;
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
$string =~ s/\s+//g;
my @chunks;
while (length($string)) {
push(@chunks, substr($string, 0, 58, ''));
}
dd($string, \@chunks);
Output:
(
"",
[
"MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD",
"PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF",
"VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL",
"EDAVTKPELRPCPTP",
],
)
Upvotes: 1
Reputation: 29854
What you're missing is the ability to capture less than 58 characters. And since you only want to do that if it's the end, you can do this:
/[A-Z]{58}|[A-Z]{1,57}\z/
Which I would prefer to write like this:
/\p{Upper}{58}|\p{Upper}{1,57}\z/
However, since this expression is greedy by default, it will prefer to gather 58 characters, and only default to less when it runs out of matching input.
/\p{Upper}{1,58}/
Or, for reasons as Schwern mentions (such as avoiding any foreign letters)
/[A-Z]{1,58}/
Upvotes: 2
Reputation: 165546
You've almost got it. You need to change your regex to allow for 1 to 58 characters.
my @array = $string =~ /[A-Z]{1,58}/g;
In addition, you have an error in your script using @prot_seq
instead of @array
. You should always use strict
to protect yourself against this sort of thing. Here's the script with strict, warnings, and 5.10 features (to get say).
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
# Strip whitespace.
$string =~ s/\s+//g;
# Split the string into fragments of 58 characters or less
my @fragments = $string =~ /[A-Z]{1,58}/g;
say "Num fragments: ".scalar @fragments;
say join "\n", @fragments;
Upvotes: 5