wa4_tasty_elephant
wa4_tasty_elephant

Reputation: 157

Regex: section contains

I need to know if a section of a string contains a specific word.

Example: Search for color=" in <font to >

<font color="black">                                 = <font color="black">
BlaBla <font color="red">                            = <font color="red">
<font size="2" color="white">                        = <font size="2" color="white">
<font size="2">                                      = false
<font size="10"><font color="black"><font size="10"> = <font color="black">

I use Java with String.matches()

Upvotes: 1

Views: 115

Answers (4)

Arnaud Denoyelle
Arnaud Denoyelle

Reputation: 31245

You can handle this with regex but this is hazardous.

On the other hand, JSOUP is intended for that use case and very easy to use.

Example :

public static void main(String[] argv) throws Exception {
  Document document = Jsoup.parse("<font id=\"myFont\" color=\"black\">");
  Elements font = document.select("font");
  for (Element element : font) {
    System.out.println(element.attr("color"));
  }

}

Output :

black

Upvotes: 2

Saleem
Saleem

Reputation: 8988

Try following regex:

(?<=\<)(\w+)[^<]*color.*?\>

Demo:

String data = "<font color=\"black\">";
String strFind = "color";

Pattern regex = Pattern.compile("(?<=<)(\\w+)[^<]*"+strFind+".*?>", Pattern.MULTILINE);

Matcher matcher = regex.matcher(data);
while (matcher.find()) {
    String content = matcher.group(1) == null ? matcher.group() : matcher.group(1);
    System.out.println(content);
}

Provided sample text, it will print name of tag containing desired string. In this case it will be font

Upvotes: 1

1ac0
1ac0

Reputation: 2939

For parsing HTML it should be better do it with JSOUP. For quick introduction start with cookbook.

Upvotes: 3

mellamokb
mellamokb

Reputation: 56779

Based just on your example test cases provided, you might be able to get away with a simple regular expression like this:

<font[^>]*color="[^"]+"[^>]*>

Demo: http://jpad.io/example/1u/36573959-example

However, as pointed out in the comments, regular expressions are generally not well-suited for processing HTML.

Upvotes: 2

Related Questions