Reputation: 113
I've this piece of code that validates XML against XSD
public void Validate()
{
XDocument xdoc = XDocument.Load("XML path");
var schemas = new XmlSchemaSet();
schemas.Add(null, "XSD path");
xdoc.Validate(schemas, ValidationCallBack);
}
private void ValidationCallBack(object sender, ValidationEventArgs args)
{
if (args.Severity != XmlSeverityType.Error)
return;
throw new XmlSchemaValidationException(args.Message);
}
If i have in the xsd an element with string type and has pattern ([^\t\r\n]*) And the xml value tag is
<tagname> There is LF character here
</tagname>
It passes from validation however the tag value has trailing 'LF' character only. How should it be invalid and fails in xml validation? Note that i cannot modify in the xsd
Upvotes: 2
Views: 978
Reputation: 7279
There are several interesting aspects to this question.
Parsing and validating XML documents is done in a technology stack involving decoding, parsing, converting to an XML Information Set (infoset), and validating against an XML Schema.
Before parsing, the XML specification says that any CR characters are replaced with LF characters (or removed if appearing as CR LF), leaving only LF characters. The parser will thus not see any CR character except in some corner cases.
When converting to infoset, white space (including LF) appearing outside of the document element (which is my understanding of "trailing" in the question: there is also the concept of trailing white space in attributes) is omitted. Thus, when the XML infoset of the document has been built, there is no information left about trailing white space.
XML Schema validation is performed against the above infoset, which means that the Schema will also not see any trailing white spaces either.
Checking for trailing CR or LF characters in the instance, even though it does make sense, is thus outside the scope of Schema validation and should be done with other tools ahead of the XML processing phase.
Upvotes: 4