Reputation: 11508
I've a set of html docs that I need to parse. They are encoded in Latin1Encoded. I'm using HtmlAgiliy pack for "parsing".
I have a Xpath query (with swedish characters) that I can't get to work because of different encodings between the docs and the encoding VS stores the XPath query in??
Xpath query:
doc.DocumentNode.SelectNodes(@"//h2[text()='Företag']/../div//span[text()='Resultat:']/../div");
The xpath query works fine in the Firefox extension xpath checker.
Upvotes: 2
Views: 830
Reputation: 176229
Could you provide more sample code and some input XML document? From the information given I wrote a little sample program which just works as expected. Does the following work for you?
Sample document:
<?xml version="1.0" encoding="iso-8859-1"?>
<doc>
<test>Företag</test>
<test>Hallå</test>
</doc>
C#
using System;
using System.Xml.XPath;
class Program
{
static void Main(string[] args)
{
XPathDocument xpdoc = new XPathDocument(@"sample.xml");
XPathNavigator nav = xpdoc.CreateNavigator();
XPathNodeIterator iter = nav.Select("//*[text() = 'Företag']");
while (iter.MoveNext())
{
Console.WriteLine(iter.Current.ToString());
}
}
}
Output
Företag
From the sample code given it seems that you are using the Microsoft.Windows.Design.Documents.Trees.DocumentNode
class. However, the documentation states that this class is not intended to be used directly. May I ask what you are trying to do?
Update: It might be that you are facing an issue with whitespace normalization (which might be done by your FireFox add-in and not in your code). Have you tried to change your XPath by replacing the test text() = 'Företag'
by normalize-space() = 'Företag'
(Just to exclude the case that there is additional leading or trailing whitespace)?
Upvotes: 3