armin
armin

Reputation: 2025

Is there a way for ignoring all xml parsing exceptions?

I have to parse user documents which sometimes they are not well formed.It might contain spaces before tags or some other issue.how can I make them well formed or if this is'nt possible how can I ignore all exceptions? I also get exceptions about byte mark order because the document is in UTF-16 encoding but has no byte mark,and I can't add any because they are user files.

Okay,Can anyone tell me whats wrong with this sample data? (this is the note from device documentation : All the exchanges generated by this protocol will be carried out by using an XML file conform with the XSD described in this document.)

     <?xml version="1.0" encoding="UTF-16"?>
     <PROTOCOLE_HEMATO_BIOCODE InstrumentCode="2" InstrumentType="Diana 5 Evolution"   SerialNumber="Ns" Version="C4.06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <PROTOCOL_DATA>
     <RESULT>
     <INFORMATION>
     <PATIENT DoB="2011-08-03" FirstName="ALI" Location="" MedicalDoctor="" Name="NAVIDI" PatientCommentary="" PID="" RefTable="1" SID="1059"/>
     </INFORMATION>
     <DATAS DateTimeAnalyse="2011-08-03T11:36:11Z" IdOpAnalyse="Service" UnitsSytem="US">
     <PARAMETER IDParametre="0" LowerRefLimit="4" Nom="WBC" Statut_Limits="48" Units="K/µL" UpperRefLimit="10" Value="4.6"/>
     <PARAMETER IDParametre="1" LowerRefLimit="20" Nom="Lym%" Statut_Limits="48" Units="%" UpperRefLimit="45" Value="37.8"/>
     <PARAMETER IDParametre="2" LowerRefLimit="2" Nom="Mon%" Statut_Limits="48" Units1111="%" UpperRefLimit="8" Value="6"/>
     <PARAMETER IDParametre="3" LowerRefLimit="40"Nom="Neu%" Statut_Limits="48" Units="%" UpperRefLimit="75" Value="51.8"/>
     <PARAMETER IDParametre="4" LowerRefLimit="0" Nom="Bas%" Statut_Limits="48" Units="%" UpperRefLimit="3" Value="0"/>
     <PARAMETER IDParametre="5" LowerRefLimit="1" Nom="Eos%" Statut_Limits="48" Units="%" UpperRefLimit="7" Value="4.4"/>
     <PARAMETER IDParametre="7" LowerRefLimit="1.5" Nom="Lym#" Statut_Limits="48" Units="K/µL" UpperRefLimit="4.5" Value="1.7"/>
     <PARAMETER IDParametre="8" Nom="Mon#" Statut_Limits="48" Units="K/µL" UpperRefLimit="0.8" Value="0.28"/>
     <PARAMETER IDParametre="9" LowerRefLimit="2" Nom="Neu#" Statut_Limits="48" Units="K/µL" UpperRefLimit="7.5" Value="2.4"/>
     <PARAMETER IDParametre="10" Nom="Bas#" Statut_Limits="48" Units="K/µL" UpperRefLimit="0.2" Value="0"/>
     <PARAMETER IDParametre="11" Nom="Eos#" Statut_Limits="48" Units="K/µL" UpperRefLimit="0.6" Value="0.2"/>
     <PARAMETER IDParametre="21" LowerRefLimit="4.5" Nom="RBC" Statut_Limits="48" Units="M/µL" UpperRefLimit="6.2" Value="5.11"/>
     <PARAMETER IDParametre="22" LowerRefLimit="12" Nom="Hb" Statut_Limits="48" Units="g/dL" UpperRefLimit="18" Value="16.2"/>
     <PARAMETER IDParametre="23" LowerRefLimit="35" Nom="Hct" Statut_Limits="48" Units="%" UpperRefLimit="54" Value="48.8"/>
     <PARAMETER IDParametre="24" LowerRefLimit="80" Nom="MCV" Statut_Limits="51" Units="fL" UpperRefLimit="95" Value="95.5"/>
     <PARAMETER IDParametre="25" LowerRefLimit="27" Nom="MCH" Statut_Limits="48" Units="pg" UpperRefLimit="32" Value="31.7"/>
     <PARAMETER IDParametre="26" LowerRefLimit="32" Nom="MCHC" Statut_Limits="48" Units="%" UpperRefLimit="36" Value="33.2"/>
     <PARAMETER IDParametre="27" LowerRefLimit="11" Nom="RDW-cv" Statut_Limits="48" Units="%" UpperRefLimit="15" Value="10.6"/>
     <PARAMETER IDParametre="28" Nom="RDW-sd" Statut_Limits="48" Units="fL" Value="33.9"/>
     <PARAMETER IDParametre="29" LowerRefLimit="150" Nom="Plt" Statut_Limits="48" Units="K/µL" UpperRefLimit="500" Value="200"/>
     <PARAMETER IDParametre="30" LowerRefLimit="6" Nom="MPV" Statut_Limits="48" Units="fL" UpperRefLimit="10" Value="7.3"/>
     <PARAMETER IDParametre="31" Nom="Pct" Statut_Limits="48" Units="%" Value="0.15"/>
     <PARAMETER IDParametre="32" Nom="PDW" Statut_Limits="48" Units="%" Value="8.4"/>
     <PARAMETER IDParametre="33" Nom="Lx" Statut_Limits="48" Units=" " Value="20"/>
     <PARAMETER IDParametre="34" Nom="Ly" Statut_Limits="48" Units=" " Value="16"/>
     <PARAMETER IDParametre="35" Nom="Nx" Statut_Limits="48" Units=" " Value="59"/>
     </DATAS>
     <TRACABILITE IDOpValidation="" ModeleAnalyseur="Diana 5 Evolution" SerialNumber="" VersionCalcul="C4.06" VersionPackage="V6.26">
     <REACTIF ExpirationDate="2014-07-31" Lot="562" Product="HEMATON-5    "/>
     <REACTIF ExpirationDate="2014-05-04" Lot="12452" Product="HEMACORE    "/>
     <REACTIF ExpirationDate="2013-07-03" Lot="73049" Product="HEMALYSE-5    "/>
     <FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="0" ParameterName="WBC"/>
     <FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="21" ParameterName="RBC"/>
     <FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="22" ParameterName="Hb"/>
     <FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="24" ParameterName="MCV"/>
     <FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="29" ParameterName="Plt"/>
     <FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="30" ParameterName="MPV"/>
     </TRACABILITE>
     <IMAGE DataSize="6676" ImageType="3">
     <IMAGE_DATA>AQAAA
     </IMAGE_DATA>
     </IMAGE>
     </RESULT>
     </PROTOCOL_DATA>
     </PROTOCOLE_HEMATO_BIOCODE>

Upvotes: 1

Views: 2587

Answers (3)

Justin
Justin

Reputation: 86729

Just to be clear:

  • Just because something looks like XML doesn't mean that it is XML. If your document is not a well formed XML document then it isn't an XML document. From the specification:

A data object is an XML document if it is well-formed

  • If your document is not XML then you can't parse it using an XML parser

If it is just an encoding problem then you can specify the encoding when reading the file:

using (StreamReader reader = new StreamReader("myfile.xml", Encoding.Unicode))
{
    XmlDocument doc = new XmlDocument();
    doc.Load(reader);
}

The above will load the file "myfile.xml" with the UTF-16 format using the little endian byte order.

Upvotes: 0

Kaerber
Kaerber

Reputation: 1653

You can try to use SAX for .NET, available at http://saxdotnet.sourceforge.net

It's not a document-parsing API, rather, tag-parsing, so it shouldn't throw exceptions on not-well-formed XML documents. But you'll have to write all the logic to process tags yourself.

Upvotes: 0

Bas Slagter
Bas Slagter

Reputation: 9929

You can write (or look on the internet for) an XML sanitizer method, class or library. Basically you need to clean up the XML line by line (removing spaces and such) before you can parse it correctly. Probably what you have now can't even be called XML.

Upvotes: 1

Related Questions