Reputation: 97
Using the enclosed regex I'm able to match extract the 'model_name' value when nfc_support" value="true in a few instances. However, I'm unable to get it to match is other instances as displayed below. Any help in getting it to match in both instances would be greatly appreciated.
EX:
<capability name=\"model_name\"[A-Za-z1-9"=();,._/<>\s]*<capability name=\"nfc_support\" value=\"true\"/>
Will work with:
<capability name="model_name" value="T11"/>
<capability name="brand_name" value="Turkcell"/>
<capability name="marketing_name" value="Campaign"/>
</group>
<group id="chips">
<capability name="nfc_support" value="true"/>
</group>
But cannot match this:
<capability name="model_name" value="U8650"/>
<capability name="brand_name" value="Huawei"/>
<capability name="marketing_name" value="Sonic"/>
</group>
<group id="chips">
<capability name="nfc_support" value="true"/>
Upvotes: 0
Views: 70
Reputation: 666
Your regex will match everything between the first model_name and the last nfc_support = true, because you use the greedy *
quantifier. This is a problem if you have multiple occurences of nfc_support in the same string you are applying the regex to, as it will keep searching until it finds <capability name = "nfc_support" value = "true"/>
. A better practice to selectively match text that may appear multiple times is to use the reluctant greedy quantifier: *?
, to avoid matching too much.
Assuming all lines will follow a format of model_name, brand_name, marketing_name, /group, group id, then nfc_support, a regex that enforces this format is:
(?s)<capability name=\"model name\" value=\"(.*?)\"/>\n<capability name=\"brand_name\" value=\"(.*?)\"/>\n<capability name=\"marketing_name\" value=\"(.*?)\"/>\n</group>\n<group_id=\"chips\">\n<capability name=\"nfc_support\" value=\"true\"/>
Apologies in advance if there are typos in this regex, but you get the gist of it...
This regex will store the values of model_name, brand_name, and marketing_name into groups $1, $2, and $3, respectively, only if nfc_support is "true." The (?s)
enables multiline searching.
Upvotes: 2
Reputation: 13
Forgive me if I'm wrong, but it looks like your expression of:
[A-Za-z1-9"=();,._/<>\s]
does not account for a 0 in your character class (showing as 1-9) and should thus be:
[A-Za-z0-9"=();,._/<>\s]
EDIT: This is in regards to your example of a non-match for "model_name" value="U8650"
Upvotes: 0