Reputation: 53
I'm trying to do a bit of scraping in a c# application.
I am trying to access 4 pieces of information on the following page: https://smstestbed.nist.gov/vds/current
The following function is where I am polling a live data feed from a remote machining tool. The problem I have is that whilst I have been able to print 'CreationTime' to a terminal, my XPath use is horrifically clunky and as far as This Link seems to suggest I should be able to do what I am doing in the 2 lines after my comment
"//This should be a far better way of accessing the data but for some reason the second line fails"
Unfortunately I am getting AvailabilityNode was Null.
public static void PollNIST()
{
string NISTSourceURL = "https://smstestbed.nist.gov/vds/current"; // Gives us a human friendly reference to the HTM
//-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
// Retrieve raw HTML
var NISTTargetURL = NISTSourceURL;
var NISTHttpClient = new HttpClient();
var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTTargetURL); // We now have all of the HTML / XML Data as a raw string
//Console.WriteLine(MazXMLRaw.Result); // Prints the resulting HTML to a terminal as a debug tool (Works)
XmlDocument CurNISTXML = new XmlDocument(); // Generate Blank XML Doc
CurNISTXML.LoadXml(NISTXMLRaw.Result); // This (".result") passes the actual string?, should then be loaded into new XML file
var elementHeader = CurNISTXML.GetElementsByTagName("Header");
var curNISTHeader = elementHeader.Item(0);
var creationTime = curNISTHeader.Attributes[0]; // We actually have the creationTime
string CurNISTTime = creationTime.InnerText; ; // //*[@id="mtconnect content"]/ul/li[1]
//This should be a far better way of accessing the data but for some reason the second line fails
XmlNode AvailabilityNode = CurNISTXML.SelectSingleNode("/table[1]/tbody/tr[1]"); //*[@id="mtconnect content"]/table[1]/tbody/tr[1]/td[7] // Xpath Availability
var CurNISTStatus = AvailabilityNode.InnerText; // //*[@id="mtconnect content"]/ul/li[1]
string CurNistX = ""; // //*[@id="mtconnect content"]/table[5]/tbody/tr/td[7]
string CurNistY = ""; // //*[@id="mtconnect content"]/table[6]/tbody/tr/td[7]
Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
Console.WriteLine("NIST Time : " + creationTime.InnerText);
Console.WriteLine("NIST Status: " + CurNISTStatus);
Console.WriteLine("NIST X Pos.: " + CurNistX);
Console.WriteLine("NIST Y Pos.: " + CurNistY);
Console.WriteLine("--------END NIST DATA PACKET--------");
//var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
}
Any ideas?
Upvotes: 1
Views: 194
Reputation: 53
So it turns out there was nothing wrong with how I was extracting the XML, only with my Paths.
public static void PollNIST()
{
string NISTSourceURL = "https://smstestbed.nist.gov/vds/current"; // Gives us a human friendly reference to the HTMl
// string NistXmlUrl = // Someone on stackexchange is claiming that there is another url for the XML but viewsource says otherwise
//-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
var NISTHttpClient = new HttpClient();
var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTSourceURL); // We now have all of the HTML / XML Data as a raw string
//Console.WriteLine(MazXMLRaw.Result); // Prints the resulting HTML to a terminal as a debug tool (Works)
XmlDocument CurNISTXML = new XmlDocument(); // Generate Blank XML Doc
CurNISTXML.LoadXml(NISTXMLRaw.Result); // This (".result") passes the actual string?, should then be loaded into new XML file
// Get CreationTime (WORKING!)
XmlNodeList elementHeader = CurNISTXML.GetElementsByTagName("Header");
XmlNode curNISTHeader = elementHeader.Item(0);
XmlAttribute creationTime = curNISTHeader.Attributes[0]; // We now have the creationTime element
string CurNISTTime = creationTime.InnerText; // //*[@id="mtconnect content"]/ul/li[1]
// Get availability (WORKING!)
XmlNodeList nodeAvailability = CurNISTXML.GetElementsByTagName("Availability");
XmlNode availability = nodeAvailability.Item(0); // I think this is maybe a bit of a hackish / improper way to do this?
string curNISTStatus = availability.InnerText;
//Get linear tool X Coord.
XmlNodeList deviceStream = CurNISTXML.GetElementsByTagName("ComponentStream");
XmlNode linearCompXStream = deviceStream.Item(4);
string curNISTX = linearCompXStream.InnerText; // We do not need to break down the nodes any further as the value is the only text within
//Get Linear tool y Coord.
XmlNode linearCompYStream = deviceStream.Item(5);
string curNISTY = linearCompYStream.InnerText; // We do not need to break down the nodes any further as the value is the only text within
Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
Console.WriteLine("NIST Time : " + creationTime.InnerText);
Console.WriteLine("NIST Status: " + curNISTStatus);
Console.WriteLine("NIST X Pos.: " + curNISTX);
Console.WriteLine("NIST Y Pos.: " + curNISTY);
Console.WriteLine("--------END NIST DATA PACKET--------");
//var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
}
works nicely.
Upvotes: 0
Reputation: 163645
The XPath expression
/table[1]/tbody/tr[1]
will succeed only if the outermost element of the document is a table
element, which seems unlikely. I haven't tried to understand the logic of the page or of your code, but this definitely looks wrong. "/" at the start of a path expression selects from the root of the tree.
Upvotes: 1