Reputation: 2091
Using regular expressions, I need to extract a multiline content of a tag, which has specific id value. How can I do this?
This is what I currently have:
<div(.|\n)*?id="${value}"(.|\n)*?>(.|\n)*?<\/div>
The problem with this is this sample:
<div id="1">test</div><div id="2">test</div>
If I want to replace id="2"
using this regexp (with ${value} = 2
), the whole string would get matched. This is because from the tag opening to closing I match everything until id is found, which is wrong.
How can I do this?
Upvotes: 0
Views: 336
Reputation:
A fairly simple way is to use
Raw: <div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)</div>
Delimited: /<div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)<\/div>/
Use the variable in place of 2
.
The content will be in group 1.
Upvotes: 1
Reputation: 782498
Change (.|\n)
to [^>]
so it won't match the >
that ends the tag. Then it can't match across different divs.
<div\b[^>]*\bid="${value}"[^>]*>.*?<\/div>
Also, instead of using (.|\n)*
to match across multiple lines, use the s
modifier to the regexp. This makes .
match any character, including newlines.
However, using regular expressions to parse HTML is not very robust. You should use a DOM parser.
Upvotes: 0