Freedom
Freedom

Reputation: 347

HTML tag finder

I am trying to create a method to find and return the first tag in a given HTML string, and returns null if no such tag is found. (A tag would be something like <b>)

I looked through the String class methods but I can't find a method that can suit this purpose. I'm thinking my plan is to scan each word for a "<" then once it is found, scan for a ">", but am unsure of how to do so. Also wondering if I should put a while/for loop in there? Help is appreciated, thank you.

public class HTMLProcessor {

    public static void main(String[] args) {
    System.out.println(findFirstTag("<b>The man jumped.</b>"));
    }

    public static String findFirstTag(String text) {
    int firstIndex = text.indexOf("<");
    if (firstIndex >= 0) {
        String newText = text.substring(firstIndex);
        int secondIndex = newText.indexOf(">");

        return text.substring(firstIndex, secondIndex + 1);
    } else {
        return null;
    }

}

Upvotes: 3

Views: 906

Answers (3)

Suresh Atta
Suresh Atta

Reputation: 121998

You can try with indexOf() and lastIndexOf() methods from String class.

You definitely need a HTML parser, Just pick one. Jsoup is one the best html parser.

Considering you are doing this multiple times and places.

And do not prefer much for regex while dealing with html strings

Upvotes: 2

zero_dev
zero_dev

Reputation: 653

Use regular expressions.

Pattern p = Pattern.compile("<([A-Z][A-Z0-9]*)\\b[^>]*>(.*?)</\\1>"); 
Matcher m = p.matcher(yourText);

Will match things like <b>this is bold</b>

Upvotes: 2

kddeisz
kddeisz

Reputation: 5192

Take a look at java regular expressions here. If you need an introduction to regex look here. This is probably the quickest way to accomplish what you're looking for.

Upvotes: 1

Related Questions