Reputation: 1400
I'm trying to remove Tokens from a semicolon-separated string using regex. An example strings look as follows:
Field1=Blah;Field2=Bluh;Field3=Dingdong;Uid=John;Pwd=secret;Field4=lalali
Field1=Blah;Field2=Bluh;Field3=Dingdong;Uid=John;Pwd=secret;Field4=lalali;
So I want to remove the "Uid" and "Pwd" tokens in separate commands, as to not remove any trailing tokens (e.g. Field4 should remain at the end).
My current attempt is to do:
$mystring =~s /Uid=.+;//i;
which yields
Field1=Blah;Field2=Bluh;Field3=Dingdong;Field4=lalali
Which works for the first line, but won't work for the 2nd line with the semicolon at the end, where it yields
Field1=Blah;Field2=Bluh;Field3=Dingdong;
and removes Field4 incorrectly. I tried a number of variations like
$mystring =~s /Uid=.+;?//i;
$mystring =~s /Uid=.+;+?//i;
without success. I realize that I need to tell the Regex to only match up to the first semicolon, but I can't figure out how.
NOW, just so that I don't look completely stupid, I was able to get it to work by doing this:
$mystring =~s /Uid=[^;]+;//i;
but I'm still wondering why I can't tell the expression to only match up to the first semicolon ...
Upvotes: 0
Views: 203
Reputation: 753515
If you don't want to use the negated character class (which will work with most regex packages) you can use a non-greedy quantifier to match the data following the keyword (but it will only work with Perl compatible regex packages). See Quantifiers under Regular expressions for more information.
$mystring =~s /Uid=.+?;//i;
The extra question mark makes the +
non-greedy; it takes the minimum string that will match instead of the maximum, so it won't match any semicolons.
Upvotes: 3
Reputation: 57590
When you use a quantifier like +
or *
, then they are greedy. They gobble up as many characters as possible, and only give them back if they are forced through backtracking. The pattern .*;
will therefore match everything until the last semicolon.
Maybe the greedy quantifiers should go on a diet. We can force them to by using lazy versions: +?
and *?
. These will terminate as early as possible. So the pattern would be:
/Uid=.+?;/ # repeat for Pwd
which matches until the first semicolon
This works, but it is considered good style to rather use a negated character class instead of non-greedy quantifiers with the .
class:
/Uid=[^;]+;/
because there are less ways this can go wrong (like deleting the rest of the line). It is also more explicit than the other solution.
Upvotes: 4