Reputation: 14065
I am new to XML and I am now trying to read an xml file. I googled and try this way to read xml but I get this error.
Reference to undeclared entity 'Ccaron'. Line 2902, position 9.
When I go to line 2902 I got this,
<H0742>Čopova 14, POB 1725,
SI-1000 Ljubljana</H0742>
This is the way I try
XmlDocument xDoc = new XmlDocument();
xDoc.Load(file);
XmlNodeList nodes = xDoc.SelectNodes("nodeName");
foreach (XmlNode n in nodes)
{
if (n.SelectSingleNode("H0742") != null)
{
row.IrNbr = n.SelectSingleNode("H0742").InnerText;
}
.
.
.
}
When I look at w3school, & is illegal in xml.
EDIT : This is the encoding. I wonder it's related with xml somehow.
encoding='iso-8859-1'
Thanks in advance.
EDIT :
They gave me an .ENT file and I can reference online ftp.MyPartnerCompany.com/name.ent. In this .ENT file I see entities like that
<!ENTITY Cacute "Ć"> <!-- latin capital letter C with acute,
U+0106 Latin Extended-A -->
How can I reference it in my xml Parsing ? I prefer to reference online since they may add new anytime. Thanks in advance !!!
Upvotes: 2
Views: 1094
Reputation: 7143
The first thing to be aware of is that the problem isn't in your software.
As you are new to XML, I'm going to guess that definining entities isn't something you've come across before. Character entities are shortcuts for arbitrary pieces of text (one or more characters). The most common place you are going to see them is in the situation you are in now. At some point, your XML has been created by someone who wanted to type the character 'Č' or 'č' (that's upper and lower case C with Caron if your font can't display it).
However, in XML we only have a few predeclared entities (ampersand, less than, greater than, double quote and apostraphe). Any other character entities need to be declared. In order to parse your file correctly you will need to do one of two things - either replace the character entity with something that doesn't cause the parser issues or declare the entity.
To declare the entity, you can use something called an "internal subset" - a specialised form of the DTD statement you might see at the top of your XML file. Something like this:
<!DOCTYPE root-element
[ <!ENTITY Ccaron "Č">
<!ENTITY ccaron "č">]
>
Placing that statement at the beginning of the XML file (change the 'root-element' to match yours) will allow the parser to resolve the entity.
Alternatively, simply change the Č
to Č
and your problem will also be resolved.
The &#
notation is a numeric entity, giving appropriate unicode value for the character (the 'x' indicates that it's in hex).
You could always just type the character too but that requires knowledge of the ins and outs of your keyboard and region.
Upvotes: 3
Reputation: 66714
Č
is an entity reference. It is likely that the entity reference is intended to be for the character Č, in order to produce: Čopova
.
However, that entity must be declared, or the XML parser will not know what should be substituted for the entity reference as it parses the XML.
Upvotes: 1
Reputation: 57936
Your XML file isn't well-formed and, so, can't be used as XmlDocument. Period.
You have two options:
System.Xml
, but probably concatening several strings, as "XML is just a text file". You should repair it, or opening a generated XML file will be always a surprise.EDIT: As you can't fix your XML generator, I recommend to open it with File.ReadAllText
and execute an regular expression to re-encode that &
or to strip off entire entity (as we can't translate it)
Console.WriteLine(
Regex.Replace("<H0742>Čopova 14, { POB & SI-1000 &</H0742>",
@"&((?!#)\S*?;)?", match =>
{
switch (match.Value)
{
case "<":
case ">":
case "&":
case """:
case "'":
return match.Value; // correctly encoded
case "&":
return "&";
default: // here you can choose:
// to remove entire entity:
return "";
// or just encode that & character
return "&" + match.Value.Substring(1);
}
}));
Upvotes: 1
Reputation: 301
solution :-
byte[] encodedString = Encoding.UTF8.GetBytes(xml);
// Put the byte array into a stream and rewind it to the beginning
MemoryStream ms = new MemoryStream(encodedString);
ms.Flush();
ms.Position = 0;
// Build the XmlDocument from the MemorySteam of UTF-8 encoded bytes
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(ms);
Upvotes: 0
Reputation: 60987
Č
isn't XML it's not even defined in the HTML 4 entity reference. Which btw isn't XML. XML doesn't support all those entities, in fact, it supports very few of them but if you look up the entity and find it, you'll be able to use it's Unicode equivalent, which you can use. e.g. Š
is invalid XML but Š
isn't. (Scaron
was the closest I could find to Ccaron
).
Upvotes: 2