Reputation: 347
I am trying to create a method to find and return the first tag in a given HTML string, and returns null if no such tag is found. (A tag would be something like <b>
)
I looked through the String class methods but I can't find a method that can suit this purpose. I'm thinking my plan is to scan each word for a "<" then once it is found, scan for a ">", but am unsure of how to do so. Also wondering if I should put a while/for loop in there? Help is appreciated, thank you.
public class HTMLProcessor {
public static void main(String[] args) {
System.out.println(findFirstTag("<b>The man jumped.</b>"));
}
public static String findFirstTag(String text) {
int firstIndex = text.indexOf("<");
if (firstIndex >= 0) {
String newText = text.substring(firstIndex);
int secondIndex = newText.indexOf(">");
return text.substring(firstIndex, secondIndex + 1);
} else {
return null;
}
}
Upvotes: 3
Views: 906
Reputation: 121998
You can try with indexOf()
and lastIndexOf()
methods from String class.
You definitely need a HTML parser, Just pick one. Jsoup
is one the best html parser.
Considering you are doing this multiple times and places.
And do not prefer much for regex while dealing with html strings
Upvotes: 2
Reputation: 653
Use regular expressions.
Pattern p = Pattern.compile("<([A-Z][A-Z0-9]*)\\b[^>]*>(.*?)</\\1>");
Matcher m = p.matcher(yourText);
Will match things like <b>this is bold</b>
Upvotes: 2