Reputation: 2025
I have to parse user documents which sometimes they are not well formed.It might contain spaces before tags or some other issue.how can I make them well formed or if this is'nt possible how can I ignore all exceptions? I also get exceptions about byte mark order because the document is in UTF-16 encoding but has no byte mark,and I can't add any because they are user files.
Okay,Can anyone tell me whats wrong with this sample data? (this is the note from device documentation : All the exchanges generated by this protocol will be carried out by using an XML file conform with the XSD described in this document.)
<?xml version="1.0" encoding="UTF-16"?>
<PROTOCOLE_HEMATO_BIOCODE InstrumentCode="2" InstrumentType="Diana 5 Evolution" SerialNumber="Ns" Version="C4.06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<PROTOCOL_DATA>
<RESULT>
<INFORMATION>
<PATIENT DoB="2011-08-03" FirstName="ALI" Location="" MedicalDoctor="" Name="NAVIDI" PatientCommentary="" PID="" RefTable="1" SID="1059"/>
</INFORMATION>
<DATAS DateTimeAnalyse="2011-08-03T11:36:11Z" IdOpAnalyse="Service" UnitsSytem="US">
<PARAMETER IDParametre="0" LowerRefLimit="4" Nom="WBC" Statut_Limits="48" Units="K/µL" UpperRefLimit="10" Value="4.6"/>
<PARAMETER IDParametre="1" LowerRefLimit="20" Nom="Lym%" Statut_Limits="48" Units="%" UpperRefLimit="45" Value="37.8"/>
<PARAMETER IDParametre="2" LowerRefLimit="2" Nom="Mon%" Statut_Limits="48" Units1111="%" UpperRefLimit="8" Value="6"/>
<PARAMETER IDParametre="3" LowerRefLimit="40"Nom="Neu%" Statut_Limits="48" Units="%" UpperRefLimit="75" Value="51.8"/>
<PARAMETER IDParametre="4" LowerRefLimit="0" Nom="Bas%" Statut_Limits="48" Units="%" UpperRefLimit="3" Value="0"/>
<PARAMETER IDParametre="5" LowerRefLimit="1" Nom="Eos%" Statut_Limits="48" Units="%" UpperRefLimit="7" Value="4.4"/>
<PARAMETER IDParametre="7" LowerRefLimit="1.5" Nom="Lym#" Statut_Limits="48" Units="K/µL" UpperRefLimit="4.5" Value="1.7"/>
<PARAMETER IDParametre="8" Nom="Mon#" Statut_Limits="48" Units="K/µL" UpperRefLimit="0.8" Value="0.28"/>
<PARAMETER IDParametre="9" LowerRefLimit="2" Nom="Neu#" Statut_Limits="48" Units="K/µL" UpperRefLimit="7.5" Value="2.4"/>
<PARAMETER IDParametre="10" Nom="Bas#" Statut_Limits="48" Units="K/µL" UpperRefLimit="0.2" Value="0"/>
<PARAMETER IDParametre="11" Nom="Eos#" Statut_Limits="48" Units="K/µL" UpperRefLimit="0.6" Value="0.2"/>
<PARAMETER IDParametre="21" LowerRefLimit="4.5" Nom="RBC" Statut_Limits="48" Units="M/µL" UpperRefLimit="6.2" Value="5.11"/>
<PARAMETER IDParametre="22" LowerRefLimit="12" Nom="Hb" Statut_Limits="48" Units="g/dL" UpperRefLimit="18" Value="16.2"/>
<PARAMETER IDParametre="23" LowerRefLimit="35" Nom="Hct" Statut_Limits="48" Units="%" UpperRefLimit="54" Value="48.8"/>
<PARAMETER IDParametre="24" LowerRefLimit="80" Nom="MCV" Statut_Limits="51" Units="fL" UpperRefLimit="95" Value="95.5"/>
<PARAMETER IDParametre="25" LowerRefLimit="27" Nom="MCH" Statut_Limits="48" Units="pg" UpperRefLimit="32" Value="31.7"/>
<PARAMETER IDParametre="26" LowerRefLimit="32" Nom="MCHC" Statut_Limits="48" Units="%" UpperRefLimit="36" Value="33.2"/>
<PARAMETER IDParametre="27" LowerRefLimit="11" Nom="RDW-cv" Statut_Limits="48" Units="%" UpperRefLimit="15" Value="10.6"/>
<PARAMETER IDParametre="28" Nom="RDW-sd" Statut_Limits="48" Units="fL" Value="33.9"/>
<PARAMETER IDParametre="29" LowerRefLimit="150" Nom="Plt" Statut_Limits="48" Units="K/µL" UpperRefLimit="500" Value="200"/>
<PARAMETER IDParametre="30" LowerRefLimit="6" Nom="MPV" Statut_Limits="48" Units="fL" UpperRefLimit="10" Value="7.3"/>
<PARAMETER IDParametre="31" Nom="Pct" Statut_Limits="48" Units="%" Value="0.15"/>
<PARAMETER IDParametre="32" Nom="PDW" Statut_Limits="48" Units="%" Value="8.4"/>
<PARAMETER IDParametre="33" Nom="Lx" Statut_Limits="48" Units=" " Value="20"/>
<PARAMETER IDParametre="34" Nom="Ly" Statut_Limits="48" Units=" " Value="16"/>
<PARAMETER IDParametre="35" Nom="Nx" Statut_Limits="48" Units=" " Value="59"/>
</DATAS>
<TRACABILITE IDOpValidation="" ModeleAnalyseur="Diana 5 Evolution" SerialNumber="" VersionCalcul="C4.06" VersionPackage="V6.26">
<REACTIF ExpirationDate="2014-07-31" Lot="562" Product="HEMATON-5 "/>
<REACTIF ExpirationDate="2014-05-04" Lot="12452" Product="HEMACORE "/>
<REACTIF ExpirationDate="2013-07-03" Lot="73049" Product="HEMALYSE-5 "/>
<FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="0" ParameterName="WBC"/>
<FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="21" ParameterName="RBC"/>
<FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="22" ParameterName="Hb"/>
<FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="24" ParameterName="MCV"/>
<FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="29" ParameterName="Plt"/>
<FACTEUR_CALIBRATION FactorDate="2011-07-31" FactorValue="1" IDParametre="30" ParameterName="MPV"/>
</TRACABILITE>
<IMAGE DataSize="6676" ImageType="3">
<IMAGE_DATA>AQAAA
</IMAGE_DATA>
</IMAGE>
</RESULT>
</PROTOCOL_DATA>
</PROTOCOLE_HEMATO_BIOCODE>
Upvotes: 1
Views: 2587
Reputation: 86729
Just to be clear:
A data object is an XML document if it is well-formed
If it is just an encoding problem then you can specify the encoding when reading the file:
using (StreamReader reader = new StreamReader("myfile.xml", Encoding.Unicode))
{
XmlDocument doc = new XmlDocument();
doc.Load(reader);
}
The above will load the file "myfile.xml" with the UTF-16 format using the little endian byte order.
Upvotes: 0
Reputation: 1653
You can try to use SAX for .NET, available at http://saxdotnet.sourceforge.net
It's not a document-parsing API, rather, tag-parsing, so it shouldn't throw exceptions on not-well-formed XML documents. But you'll have to write all the logic to process tags yourself.
Upvotes: 0
Reputation: 9929
You can write (or look on the internet for) an XML sanitizer method, class or library. Basically you need to clean up the XML line by line (removing spaces and such) before you can parse it correctly. Probably what you have now can't even be called XML.
Upvotes: 1