Sebastien Lorber
Sebastien Lorber

Reputation: 92210

XML schema creation takes long time

I have the following code:

public XsdValidator(Resource... xsds) {
    Preconditions.checkArgument(xsds != null);
    try {
      this.xsds = ImmutableList.copyOf(xsds);
      SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA_NS_URI);
      LOGGER.debug("Schema factory created: {}",schemaFactory);
      StreamSource[] streamSources = streamSourcesOf(xsds);
      LOGGER.debug("StreamSource[] created: {}",streamSources);
      Schema schema = schemaFactory.newSchema(streamSources);
      LOGGER.debug("Schema created: {}",schema);
      validator = schema.newValidator();
      LOGGER.debug("Validator created: {}",validator);
    } catch ( Exception e ) {
      throw new IllegalArgumentException("Can't build XsdValidator",e);
    }
  }

It seems the line schemaFactory.newSchema(streamSources); takes a very long time (30 seconds) to execute against my XSD file.

After many tests on this XSD, it seems it's because I have:

  <xs:complexType name="entriesType">
    <xs:sequence>
      <xs:element type="prov:entryType" name="entry" minOccurs="0" maxOccurs="10000" />
    </xs:sequence>
  </xs:complexType>

The problem is maxOccurs="10000"

With maxOccurs="1" or maxOccurs="unbounded", it is very fast.

Can someone tell me what's the problem of using maxOccurs="10000" ?

Upvotes: 4

Views: 1373

Answers (1)

Petru Gardea
Petru Gardea

Reputation: 21658

Based on my personal experience, having particles bounded by what some may consider "unreasonably" high values is cause for performance problems (this link is from my browser's favourites).

The underlying cause seems to be memory allocation (to the effect indicated by the maxOccurs value).

Also, I recall a documentation item which was stating a threshold value beyond which, for all intents and purposes, the parser would actually treat the maxOccurs as unbounded, regardless of what the XSD says (I'll revisit this post if I find it).

Upvotes: 4

Related Questions