Reputation: 2497
I am working on doing some code cleanup and need to make sure that my gsub!
only runs on a small section of code. The portion of the code I need to examine starts with {{Infobox television
(\{\{[Ii]nfobox\s[Tt]elevision
to be technical) and ends with the matching double brackets "}}".
An example of the gsub!
that will be run is text.gsub!(/\|(\s*)channel\s*=\s*(.*)\n/, "|\\1network = \\2\n")
...
{{Infobox television
| show_name = 60 Minutos
| image =
| director =
| developer =
| channel = [[NBC]]
| presenter = [[Raúl Matas]] (1977–86)<br />[[Raquel Argandoña]] (1979–81)
| language = [[Spanish language|Spanish]]
| first_aired = {{Date|7 April 1975}}
| website = {{url|https://foo.bar.com}}
}}
...
Note:
sub
instead of gsub
is not an option due to the fact that multiple instances of the parameter needed to be substituted may exist. }}
as there may be multiple sets as show in the example above. Upvotes: 2
Views: 103
Reputation: 627517
You may use a regex with a bit of recursion:
/(?=\{\{[Ii]nfobox\s[Tt]elevision)(\{\{(?>[^{}]++|\g<1>)*}})/
Or, if there are single {
or }
inside, you will need to also match those with (?<!{){(?!{)|(?<!})}(?!})
:
/(?=\{\{[Ii]nfobox\s[Tt]elevision)(\{\{(?>[^{}]++|(?<!{){(?!{)|(?<!})}(?!})|\g<1>)*}})/
See the Rubular demo
Details:
(?=\{\{[Ii]nfobox\s[Tt]elevision)
- a positive lookahead making sure the current location is followed with {{Infobox television
like string (with different casing)(\{\{(?>[^{}]++|\g<1>)*}})
- Group 1 that matches the following:
\{\{
- a {{
substring(?>[^{}]++|\g<1>)*
- zero or more occurrences of:[^{}]++
- 1 or more chars other than {
and }
(?<!{){(?!{)
- a {
not enclosed with other {
(?<!})}(?!})
- a }
not enclosed with other }
|
- or\g<1>
- the whole Group 1 subpattern}}
- a }}
substringUpvotes: 1
Reputation: 2380
Can't give you a direct answer without spending a lot of time on it.
But it is noteable that the first bracket set is at the beginning of a line, as is the last one.
So you have
^{{(.*)^}}$/m
The m
means multiline match. That will match everything between the braces - the () brackets mean that you can pull out what was matched inside the braces, for example:
string = <<_EOT
{{Infobox television
| show_name = 60 Minutos
| image =
| director =
| developer =
| channel = [[NBC]]
| presenter = [[Raúl Matas]] (1977–86)<br />[[Raquel Argandoña]] (1979–81)
| language = [[Spanish language|Spanish]]
| first_aired = {{Date|7 April 1975}}
| website = {{url|https://foo.bar.com}}
}}
_EOT
matcher = string.match(^{{(.*)^}}$/m)
matcher[0]
will give you the whole expression
matcher[1]
will give you what was matched inside the () brackets
The danger with this is that it will do "greedy" matching and match the largest piece of text it can, so you will have to turn this off. Without more info on what you're trying to do I can't help any more.
NB - to match () brackets you have to escape them. See https://ruby-doc.org/core-2.1.1/Regexp.html for more info.
Upvotes: 0