Reputation: 1433
I have a RegEx that is working for me but I don't know WHY it is working for me. I'll explain.
RegEx: \s*<in.*="(<?.*?>)"\s*/>\s*
Text it finds (it finds the white-space before and after the input tag):
<td class="style9">
<input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value="<?php echo $data[guarantor4]; ?>" /> </td>
</tr>
The part I don't understand:
<in.*=" <--- As I understand it, this should only find up to the first =" as in it should only find <input name="
It actually finds: <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value=" which happened to be what I was trying to do.
What am I not understanding about this RegEx?
Upvotes: 6
Views: 244
Reputation: 21
Your greedy approach is causing confusion. You want .*?
Consider the input 101000000000100
.
Using 1.*1
, *
is greedy - it will match all the way to the end, and then backtrack until it can match 1
, leaving you with 1010000000001
.
.*?
is non-greedy. *
will match nothing, but then will try to match extra characters until it matches 1
, eventually matching 101
.
Upvotes: 2
Reputation: 9570
As I understand it, this should only find up to the first =" as in it should only find <input name="
You don't say what language you're writing in, but almost all regular expression systems are "greedy matchers" - that is, they match the longest possible substring of the input. In your case, that means everything everying from the start of the input tag to the last equal-quote sequence.
Most regex systems have a way to specify that the patter only match the shortest possible substring, not the longest - "non-greedy matching".
As an aside, don't assume the first parameter will be name= unless you have full control over the construction of the input. Both HTML and XML allow attributes to be specified in any order.
Upvotes: 3
Reputation: 57354
You appear to be using 'greedy' matching.
Greedy matching says "eat as much as possible to make this work"
try with
<in[^=]*=
for starters, that will stop it matching the "=" as part of ".*"
but in future, you might want to read up on the
.*?
and
.+?
notation, which stops at the first possible condtion that matches instead of the last.
The use of 'non-greedy' syntax would be better if you were trying to only stop when you saw TWO characters,
ie:
<in.*?=id
which would stop on the first '=id' regardless of whether or not there are '=' in between.
Upvotes: 8
Reputation: 4956
.* is greedy, so it'll find up to the last =. If you want it non-greedy, add a question mark, like so: .*?
Upvotes: 4
Reputation: 63529
.*
is greedy. You want .*?
to find up to only the first =
.
Upvotes: 8