Marcelo Tataje
Marcelo Tataje

Reputation: 3871

Java Regex for a requested URL with XML

I know that there are tons of questions with issues related to this topic which is regex, but I've been trying to fill a requirement for an URL. The URL comes as follows:

POST /fr.synomia.search.ws.module.ModuleSearch/geResults/jsonp?xmlQuery=<?xml version='1.0' encoding='UTF-8'?><query ids="16914"><matchWord>avoir</matchWord><fullText><![CDATA[]]></fullText><quotedText><![CDATA[]]></quotedText><sensitivity></sensitivity><operator>AND</operator><offsetCooc>0</offsetCooc><cooc></cooc><collection>0</collection><searchOn>all</searchOn><nbResultDisplay>10</nbResultDisplay><nbResultatsParAspect>5</nbResultatsParAspect><nbCoocDisplay>8</nbCoocDisplay><offsetDisplay>0</offsetDisplay><sortBy>date</sortBy><dateAfter>0</dateAfter><dateBefore>0</dateBefore><ipClient>82.122.169.244</ipClient><typeQuery>0</typeQuery><equivToDelete></equivToDelete><allCooc>false</allCooc><versionDTD>3.0.5</versionDTD><r34>1tcbet30]</r34><mi>IND</mi></query>&callback=__gwt_jsonp__.P1.onSuccess&failureCallback=__gwt_jsonp__.P1.onFailure HTTP/1.1

It is an URL requested to a REST WS, in the structure of this url, we can find a tag: <query ids="16914">

I want to extract that number 16914 from the whole URL, the regex I tried to implement is the following:

private static Pattern p = Pattern.compile(
"<\\?xml version='1.0' encoding='[^']+'\\?><query ids=\"([0-9]+)\"><matchWord>.*");

I tried with some tools like Debuggex but I can't manage to find what could be the problem, I prefer to use regex instead of working with a lot of methods from the String class.

I would really appreciate any help. Thanks a lot in advance.

Upvotes: 0

Views: 188

Answers (2)

melwil
melwil

Reputation: 2553

There is nothing wrong with your regex, it works for me.

String s = "POST /fr.synomia.search.ws.module.ModuleSearch/geResults/jsonp?xmlQuery=<?xml version='1.0' encoding='UTF-8'?><query ids=\"16914\"><matchWord>avoir</matchWord><fullText><![CDATA[]]></fullText><quotedText><![CDATA[]]></quotedText><sensitivity></sensitivity><operator>AND</operator><offsetCooc>0</offsetCooc><cooc></cooc><collection>0</collection><searchOn>all</searchOn><nbResultDisplay>10</nbResultDisplay><nbResultatsParAspect>5</nbResultatsParAspect><nbCoocDisplay>8</nbCoocDisplay><offsetDisplay>0</offsetDisplay><sortBy>date</sortBy><dateAfter>0</dateAfter><dateBefore>0</dateBefore><ipClient>82.122.169.244</ipClient><typeQuery>0</typeQuery><equivToDelete></equivToDelete><allCooc>false</allCooc><versionDTD>3.0.5</versionDTD><r34>1tcbet30]</r34><mi>IND</mi></query>&callback=__gwt_jsonp__.P1.onSuccess&failureCallback=__gwt_jsonp__.P1.onFailure HTTP/1.1";
Pattern p = Pattern.compile(
            "<\\?xml version='1.0' encoding='[^']+'\\?><query ids=\"([0-9]+)\"><matchWord>.*");

Matcher m = p.matcher(s);

if (m.find()) {
    System.out.println("Group: "+m.group(1));
}

Prints:

Group: 16914

Upvotes: 1

hd1
hd1

Reputation: 34677

I'd use SAX for this purpose:

public class XMLParser extends DefaultHandler {
   int id;
   public void startElement(String ns, String qName, String localName, Attributes attrs) throws SAXException {
     if (qName.equals("query")) { 
        id = Integer.parseInt(attrs.getValue("id"));
     }
   }
   public String toString() { 
     return String.format("%d", this.id); 
   }
   public static void main(String[] args) throws Exception {
     SAXParserFactory factory = SAXParserFactory.newInstance();
     SAXParser parser = factory.newSAXParser();
     XMLParser parserObj = new XMLParser();
     parser.parse(new FileReader(args[0], parserObj);
     System.out.println(parserObj);
  }
}

Upvotes: 1

Related Questions