user1428649
user1428649

Reputation: 93

Why isn't the perl regex being as greedy as I need it to be?

So present this string:

!NAME: "Slot 10 SubSlot 0"

There may be some stuff after the final quote mark, but that is irrelevant to the task at hand.

The goal is, I want to capture everything after Slot UP UNTIL the final quote mark.

I have tried two regexes for the task

/^!NAME:\s+\".*(Slot[\w|\s|\d+]+)\"/;

The other:

/^!NAME:\s+\".*(Slot.+)\"/;

But these only capture

Slot 0

What comes after Slot can be wildly different. It could be anything like:

'Slot 4' (this works, but the capture string will not always be this small)

'Slot 4 Subslot 12 Internal Subslot 14 External'

'Slot 75 Internal Slot 12 External'

The only thing that we know for certain is that the section we want will begin with 'Slot', and will end with a quotation mark. Anything else in between is up in the air.

What is wrong with what I have shown? Especially the second one, as I thought that the '.' operator was greedy and would capture as much as it can?

The purpose of this script is to capture these details to be parsed in another program.

Upvotes: 1

Views: 75

Answers (5)

ikegami
ikegami

Reputation: 386206

The safest answer:

/^ !NAME: \s* " (?:(?!Slot).)* Slot ( [^"]* ) "/x

You could also make sure that Slot is not part of another word:

/^ !NAME: \s* " (?:(?!Slot).)* \b Slot \b ( [^"]* ) "/x

The trick is knowing that (?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR.

Upvotes: 1

dtanders
dtanders

Reputation: 1845

This should capture everything that's not a quote that comes after Slot but before the quote:

/^!NAME:\s+\"Slot([^\"]*)\"/

And to include the Slot part if you need it for some reason

/^!NAME:\s+\"(Slot[^\"]*)\"/

Upvotes: 1

Mike Covington
Mike Covington

Reputation: 2157

Here is a simple solution:

/(Slot[^"]+)/

Here it is in action:

my $s = '!NAME: "Slot 10 SubSlot 0"';
$s =~ /(Slot[^"]+)/;
print $1;

# Slot 10 SubSlot 0

If you need to specify that the line begins with !NAME:, then just expand it to this:

/^!NAME:\s"(Slot[^"]+)/

Upvotes: 0

miken32
miken32

Reputation: 42709

This works with all your example text:

^!NAME:\s*"(Slot.*?)"

https://regex101.com/r/hB1cT3/2

NB: All your examples contain nothing in quotes except the "Slot" text, so why are you putting in a .* as the first thing in the quotes? As mentioned above by mob, this was what was causing problems. I've removed it here.

Upvotes: 0

mob
mob

Reputation: 118635

It is being greedy.

/^!NAME:\s+\".*(Slot[\w|\s|\d+]+)\"/;
             ^^
              |----- The greedy part is here.

Since your target string matches Slot \d+ in two places, the .* after the quote slurps up the first one. Try making that part of the expression non-greedy:

/!NAME:\s+\".*?(Slot(?:\w|\s|\d+)+)\"/

Upvotes: 2

Related Questions