Reputation: 69
I've successfully captured data with this:
/^.{144}(.{15}).{34}(.{1})/
which results in this:
TTGGCCCCCACTCTC T
I want to remove the same characters from the same locations. I tried a simple substitution:
s/^.{144}(.{15}).{34}(.{1})//
That removes everything described. How do I remove only (...)?
Upvotes: 3
Views: 556
Reputation: 66881
The replacement side in the regex is substituted instead of everything that was matched (while there are ways to alter this to some extent), so you need to capture things intended to be kept as well, and put them back in the replacement side. Like
$var =~ s/^(.{144})(.{15})(.{34})(.)(.*)/$1$3$5/;
(the last capture was added in a comment) or
$var =~ s/^(.{144})\K(.{15})(.{34})(.)(.*)/$3$5/;
Now the 15 chars and the single char are removed from $var
, while you still have all of $N
(1--5) available to work with as needed. (In the second version the \K
keeps all matches previous to it so that they are not getting replaced, and thus we don't need $1
in the replacement side.) Please see perlretut for details.
However, as a comment enlightens us, there is a problem with this: It is not known before runtime which groups need be kept! So it could be 1,3,5 or perhaps 2 and 4 (or 7 and 11?).
What need be kept becomes known, and need be set, before the regex runs.
One way to do that: once the list of capture groups to keep is known store their indices in an array, then capture all matches into an array† and form the replacement and rewrite the string by hand
my @keep_idx = qw(0 2 4); # indices of capture groups to keep
my @captures = $var =~ /^(.{144})(.{15})(.{34})(.)(.*)/;
# Rewrite the variable using only @keep_idx -indexed captures
$var = join '', grep { defined } @captures[@keep_idx];
# Use @captures as needed...
The code above simply filters by grep
any possibly non-existent "captures" -- a pattern may allow for a variable number of capture groups (so there may not exist group #5 for example). But I'd rather check those @captures
explicitly (were there as many as expected? were they all of the expected form? etc).
There are other ways to do this.‡
† In newer perls (from version 5.25.7) there is the @{^CAPTURE}
predefined variable with all captures, so one can run the match $var =~ /.../;
and then use it. No need to assign captures.
‡ I'd like to mention one way that may be tempting, and can be seen around, but is best avoided.
One can form a string for the replacement side and double-evaluate it, like so
my $keep = q($1.$3.$5); # perl *code*, concatenating variables
$var =~ s/.../$keep/ee; # DANGEROUS. Runs any code in $keep
Here the modifiers /ee
evaluate the right-hand side, and in a way that exposes the program to evaluating code (in $keep
) that may have been slipped to it. Search for this for more information but I'd say best don't use it where it matters.
Upvotes: 3
Reputation: 69
Thanks for everyone's help. I don't get how the comments work and kept fowling those up. I've decided that the cleanest (if not most elegant) way is to create two patterns. I'm keeping other solutions for future study. This is a different example,
The list of data I want to note, then delete:
/.{41}.{24}(\D{4}).{63}.{16}(\D{2}).{22}.{228}/
Data I want to keep:
/(.{41})(.{24})\D{4}(.{63})(.{16})\D{2}(.{22})(.{228})/
It's genetic data I'm working with. I need to note insertions then delete them to re-establish the original positions for alignment purposes.
If I understand correctly, I need to upvote this to close. An idiot as myself can only do what he can do. I'll try. :)
Upvotes: 0
Reputation: 1452
Substitution works like
s/match/replace/
So it will replace youre complete "match" with "replace". If you want to keep part of your match, you must set references of the groups in the replacement string.
s/^.{144}(.{15}).{34}(.{1})// # replace all with nothing
s/^.{144}(.{15}).{34}(.{1})/$1/ # replace all with group 1 (.{15}) -> not what you want
s/^(.{144}).{15}(.{34}).{1}/$1$2/ # keeps group 1 and 2 and removes ".{15}" between them and all at the end.
The last one you need.
Try regex101. There you can give a pattern and it shows you the groups. There is a debugger, too.
Upvotes: 4