Reputation: 3996
I'm trying to get attributes from a JavaScript String with RegExp but I've a last problem.
I can get attributes with or without values, I can get attributes if space between them is forgotten but my RegExp also get the tag name as an attribute.
Live example: http://regex101.com/r/zX5dJ7/3
the regexp: (\s*\w+(?:=\"[^\"]*(?:\")?)?)
example html: <div name="value"otherattribute foo="bar/>
Is there a way to ask the RegExp to avoid the tag name ?
EDIT:
If the HTML is this:
<meta charset="utf-8" alone foo="tab"/>
<meta charset2="utf-8"foo2="tab"/>
<meta charset3="utf-8"alone2 foo3="tab unclosed/>
I want to catch every attributes like this:
My previous RegExp work well but she catch the tag name, I just want to make the regexp avoid tag name.
Upvotes: 1
Views: 109
Reputation: 20361
This is the best I can come up with:
([<\w\-]+(?:=)?(?:"|')?[\w\-]+(?:"|')?)
You will have to skip matches that begin with <
after using the regex.
DEMO: http://regex101.com/r/aL1sQ0/1
Edit: Final solution by Jordan himself: (?:<\w+)?(\s*\w+(?:=\"[^\"]*(?:\")?)?)?
Upvotes: 1
Reputation: 1302
Assuming properly formatted HTML (see my comment in the OP of why we should assumed formatted HTML), this regex will parse everything you want and will even allow a "<" in the tag name so you can easily get rid of the tag and know what's a tag and what isn't
(\w+(=\".*?\"|)|<\w+)
and in action
Parsing randomly malformed HTML is really NOT a job for regex. I cite here the countless cries of pain of many a regexper when they get asked the question of "How can I parse HTML with regular expressions?". Search stackoverflow for such questions and see what people answer. You'll see exactly why we should assume non-malformed HTML.
As stated above, after you get your matches and put them in an array or something, you can check for any string that starts with "<" and you'll know its a tag - the rest of the attributes are captured along with their contents, so no worries there.
Upvotes: 0
Reputation: 6720
If you want to get everything in between certain TAG and its CLOSING you could use
(?:<\w*)(.*)\/>
Then you can extract whatever you want from in between. If you need further info let me know
Upvotes: 1