Reputation: 42444
I want get only the starting html tags. Lets say I have html like this
<div class="some">Here is a sample text<br /><p>A paragraph here</p></div>
<ul><li>List Item</li></ul>
From the above html I want to extract this information
<div
<br
<p
<ul
<li
see I dont need ending '>' of tags
Upvotes: 0
Views: 203
Reputation: 120858
How about this:
String input = "<div class=\"some\">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul><6>";
Scanner scanner = new Scanner(input);
String result = "";
while( (result = scanner.findInLine("<\\w+")) !=null ){
System.out.println(result);
}
Upvotes: 0
Reputation: 56915
Try regex /<[a-zA-Z]+[1-6]?/g
. I added the [1-6]
for the header HTML tags - I think they're the only ones with numbers. If you wanted to be sure you could do /<[a-zA-Z0-9]+/g
, since in HTML a <
is always a tag (unless it's a comment <--
), because in-line <
get converted to <
.
Upvotes: 1
Reputation: 10565
The following returns you an array of the matches with what you want from the html body.
'<div class="some">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul>'.match(/<\w+/g)
Upvotes: 1