jbord39
jbord39

Reputation: 169

perl regex partial word match

I am trying to remove all words that contain two keys (in Perl).

For example, the string

garble variable10 variable1 vssx vddx xi_21_vssx vddx_garble_21 xi_blahvssx_grbl_2

Should become

garble variable10 variable1

To just remove the normal, unappended/prepended keys is easy:

$var =~ s/(vssx|vddx)/ /g;

However I cannot figure out how to get it to remove the entire xi_21_vssx part. I tried:

$var =~ s/\s.*(vssx|vddx).*\s/ /g

Which does not work correctly. I do not understand why... it seems like \s should match the space, then .* matches anything up to one of the patterns, then the pattern, then .* matches anything preceding the pattern until the next space.

I also tried replacing \s (whitespace) with \b (word boundary) but it also did it work. Another attempt:

$var =~ s/ .*(vssx|vddx).* / /g
$var =~ s/(\s.*vssx.*\s|\s.*vddx.*\s)/ /g

As well as a few other mungings.

Any pointers/help would be greatly appreciated.

-John

Upvotes: 1

Views: 781

Answers (4)

Marc Anton Dahmen
Marc Anton Dahmen

Reputation: 1091

Try this as the regex:

\b[\w]*(vssx|vddx)[\w]*\b

Upvotes: 0

ThisSuitIsBlackNot
ThisSuitIsBlackNot

Reputation: 24063

I am trying to remove all words that [...]

This type of problem lends itself well to grep, which can be used to find the elements in a list that match a condition. You can use split to convert your string to a list of words and then filter it like this:

use strict;
use warnings;
use 5.010;

my $string = 'garble variable10 variable1 vssx vddx xi_21_vssx vddx_garble_21 xi_blahvssx_grbl_2';

my @words = split ' ', $string;

my @filtered = grep { $_ !~ /(?:vssx|vddx)/ } @words;

say "@filtered";

Output:

garble variable10 variable1

Upvotes: 0

lightbringer
lightbringer

Reputation: 835

I think the regex will just be

$var =~ s/\S*(vssx|vddx)\S*/ /g;

Upvotes: 1

Aran-Fey
Aran-Fey

Reputation: 43166

You can use

\s*\S*(?:vssx|vddx)\S*\s*

The problem with your regex were:

  • The .* should have been non-greedy.
  • The .* in front of (vssx|vddx) mustn't match whitespace characters, so you have to use \S*.

Note that there's no way to properly preserve the space between words - i.e. a vssx b will become ab.

regex101 demo.

Upvotes: 0

Related Questions