Jvalant Dave
Jvalant Dave

Reputation: 531

How to check if a certain Pattern is in a string representation of an xml response?

I have the following code:

        Matcher title = Pattern.compile("<Title> (.+?)</Title>").matcher(epg); // for new dongle setup
//Matcher title = Pattern.compile("<Title> \"(.+?)\"</Title>").matcher(epg); // for old dongle setup

I have an xml response in string form that I'm looking to parse into the matcher object. The title will either be in this format:

<Title> "The Ellen DeGeneres Show"</Title>

or this format:

<Title> The Ellen DeGeneres Show</Title>

So essentially its a difference of quotation marks. How can I make my if statement to check for this before I choose which method to use. To sum up

if(pattern is with quotation marks){
Matcher title = Pattern.compile("<Title> \"(.+?)\"</Title>").matcher(epg);
} else if (pattern is without quotation marks){
Matcher title = Pattern.compile("<Title> (.+?)</Title>").matcher(epg)
}

I can't wrap my head around what to put in the if statements.

Upvotes: 2

Views: 719

Answers (4)

sergus
sergus

Reputation: 11

Try to use this code:

    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = domFactory.newDocumentBuilder();
    String xml = "<root><Title>test</Title></root>";
    Document dDoc = builder.parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));

    XPath xPath = XPathFactory.newInstance().newXPath();
    Node node = (Node) xPath.evaluate("//Title", dDoc, XPathConstants.NODE);
    System.out.println(node.getTextContent());

    final String text = node.getTextContent().trim();
    if(text.matches("^\\\".*\\\"$")){
        // Between double quotes
    }
    else{
        // No quotes
    }

Find "Title" node first and then check its content for pattern

Upvotes: 1

leeyuiwah
leeyuiwah

Reputation: 7152

Try writing the regex for the two respective situations, and then use the | operator to join them up.

The following is my code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexOptionalQuotationMarks {

    public static void main(String[] args) {
        String[] input = {
                "<Title> \"The Ellen DeGeneres Show\"</Title>"
                , "<Title> The Ellen DeGeneres Show</Title>"
        };

        String regexWithoutQm   = "<Title>\\s*\\w[^<]*</Title>";
        String regexWithQm      = "<Title>\\s*\"[^\"<]*\"\\s*</Title>";
        String regexBoth        = regexWithoutQm + "|" + regexWithQm;
        Pattern p = Pattern.compile(regexBoth);
        for (String s : input) {
            Matcher m = p.matcher(s);
            System.out.format("matching input %s ... %b%n", s, m.find());
        }

    }

}

The output of this program was this:

matching input <Title> "The Ellen DeGeneres Show"</Title> ... true
matching input <Title> The Ellen DeGeneres Show</Title> ... true

Upvotes: 1

Jvalant Dave
Jvalant Dave

Reputation: 531

Upon @UrosK's suggestion, I looked up how to make characters optional in Regex. Turns out I have to add a question mark after the character that I would like to be optional. Now my statement looks like the following:

Matcher title = Pattern.compile("<Title> \"?(.+?)\"?</Title>").matcher(epg);

Upvotes: 2

PNS
PNS

Reputation: 19905

You can simply try

Matcher title = Pattern.compile("<Title>\\s*\"?([^\"]*)\"?</Title>").matcher(epg);

to allow for any number of spaces (\s) after the opening tag.

Upvotes: 1

Related Questions