user3065205
user3065205

Reputation: 147

how can I convert xsd: pattern in java regex

As I know, and I used very little java regex, there is a method (or tool) to convert a control xsd:pattern in java regex?

My xsd: pattern is as follows:

<xsd:simpleType name="myCodex">
<xsd:restriction base="xsd:string">
 <xsd:pattern value="[A-Za-z]{6}[0-9]{2}[A-Za-z]{1}[0-9]{2}[A-Za-z]{1}[0-9A-Za-z]{3}[A-Za-z]{1}" />
 <xsd:pattern value="[A-Za-z]{6}[0-9LMNPQRSTUV]{2}[A-Za-z]{1}[0-9LMNPQRSTUV]{2}[A-Za-z]{1}[0-9LMNPQRSTUV]{3}[A-Za-z]{1}" />
 <xsd:pattern value="[0-9]{11,11}" />
</xsd:restriction>
</xsd:simpleType>

Upvotes: 5

Views: 2033

Answers (1)

helderdarocha
helderdarocha

Reputation: 23627

You can load the XSD into Java and extract the expressions. Then you can use them in .matches() methods or create Pattern objects if you are going to reuse them a lot.

First you need to load the XML into a Java program (I called it CodexSchema.xsd):

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document source = builder.parse(new File("CodexSchema.xsd"));

Then you can use XPath to find the patterns you want to extract (you might want to create a method that takes the name of the simple type, if you have many to process). I used a more complicated XPath expression to avoid registering the namespaces:

XPathFactory xPathfactory = XPathFactory.newInstance();
String typeName = "myCodex";
String xPathRoot = "//*[local-name()='simpleType'][@name='"+typeName+"']/*[local-name()='restriction']/*[local-name()='pattern']";
XPath patternsXPath = xPathfactory.newXPath(); // this represents the NodeList of <xs:pattern> elements

Running that expression you get org.xml.dom.NodeList containing the <xs:pattern> elements.

NodeList patternNodes = (NodeList)patternsXPath.evaluate(xPathRoot, source, XPathConstants.NODESET);

Now you can loop through them and extract the contents of their value attribute. You might want to write a method for that:

public List<Pattern> getPatterns(NodeList patternNodes) {
    List<Pattern> expressions = new ArrayList<>();
    for(int i = 0; i < patternNodes.getLength(); i++) {
        Element patternNode = (Element)patternNodes.item(i);
        String regex = patternNode.getAttribute("value");
        expressions.add(Pattern.compile(regex));
    }
    return expressions;
}

You don't really need to put them into Pattern. You could simply use String.

You can now read all your patterns in Java using:

for(Pattern p : getPatterns(patternNodes)) {
    System.out.println(p);
}

Here are some tests with the third pattern:

Pattern pattern3 = getPatterns(patternNodes).get(2);

Matcher matcher = pattern3.matcher("47385628403");
System.out.println("test1: " + matcher.find());  // prints `test1: true`

System.out.println("test2: " + "47385628403".matches(pattern3.toString()));  // prints `test2: true`

Upvotes: 2

Related Questions