Reputation: 42444

regex for getting html starting tags

I want get only the starting html tags. Lets say I have html like this

<div class="some">Here is a sample text<br /><p>A paragraph here</p></div>
<ul><li>List Item</li></ul>

From the above html I want to extract this information

<div
<br
<p
<ul
<li

see I dont need ending '>' of tags

Upvotes: 0

Answers (3)

Eugene

Reputation: 120858

How about this:

String input = "<div class=\"some\">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul><6>";
Scanner scanner = new Scanner(input);
String result = "";
while( (result = scanner.findInLine("<\\w+")) !=null ){
    System.out.println(result);
}

Upvotes: 0

mathematical.coffee

Reputation: 56915

Try regex /<[a-zA-Z]+[1-6]?/g. I added the [1-6] for the header HTML tags - I think they're the only ones with numbers. If you wanted to be sure you could do /<[a-zA-Z0-9]+/g, since in HTML a < is always a tag (unless it's a comment <--), because in-line < get converted to <.

Upvotes: 1

James Jithin

Reputation: 10565

The following returns you an array of the matches with what you want from the html body.

'<div class="some">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul>'.match(/<\w+/g)

Upvotes: 1

regex for getting html starting tags

Answers (3)

Related Questions