GigaJoules
GigaJoules

Reputation: 53

What is wrong with my use of XPath in C#?

I'm trying to do a bit of scraping in a c# application.

I am trying to access 4 pieces of information on the following page: https://smstestbed.nist.gov/vds/current

The following function is where I am polling a live data feed from a remote machining tool. The problem I have is that whilst I have been able to print 'CreationTime' to a terminal, my XPath use is horrifically clunky and as far as This Link seems to suggest I should be able to do what I am doing in the 2 lines after my comment

"//This should be a far better way of accessing the data but for some reason the second line fails"

Unfortunately I am getting AvailabilityNode was Null.

public static void PollNIST()
    {
        string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTM
        //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
        // Retrieve raw HTML
        var NISTTargetURL = NISTSourceURL;
        var NISTHttpClient = new HttpClient();
        var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTTargetURL);  // We now have all of the HTML / XML Data as a raw string
                                                                        //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
        XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
        CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

        var elementHeader = CurNISTXML.GetElementsByTagName("Header");
        var curNISTHeader = elementHeader.Item(0);
        var creationTime = curNISTHeader.Attributes[0];  // We actually have the creationTime            
        string CurNISTTime = creationTime.InnerText; ; //      //*[@id="mtconnect content"]/ul/li[1]

        //This should be a far better way of accessing the data but for some reason the second line fails
        XmlNode AvailabilityNode = CurNISTXML.SelectSingleNode("/table[1]/tbody/tr[1]");  //*[@id="mtconnect content"]/table[1]/tbody/tr[1]/td[7] // Xpath Availability
        var CurNISTStatus = AvailabilityNode.InnerText; //      //*[@id="mtconnect content"]/ul/li[1]


        string CurNistX = ""; //      //*[@id="mtconnect content"]/table[5]/tbody/tr/td[7]
        string CurNistY = ""; //      //*[@id="mtconnect content"]/table[6]/tbody/tr/td[7]

        Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
        Console.WriteLine("NIST Time  : " + creationTime.InnerText);
        Console.WriteLine("NIST Status: " + CurNISTStatus);    
        Console.WriteLine("NIST X Pos.: " + CurNistX);
        Console.WriteLine("NIST Y Pos.: " + CurNistY);
        Console.WriteLine("--------END NIST DATA PACKET--------");

        //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
    }

Any ideas?

Upvotes: 1

Views: 194

Answers (2)

GigaJoules
GigaJoules

Reputation: 53

So it turns out there was nothing wrong with how I was extracting the XML, only with my Paths.

public static void PollNIST()
        {
            string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTMl
            // string NistXmlUrl = // Someone on stackexchange is claiming that there is another url for the XML but viewsource says otherwise 
            //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
            var NISTHttpClient = new HttpClient();
            var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTSourceURL);  // We now have all of the HTML / XML Data as a raw string
                                                                            //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
            XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
            CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

            // Get CreationTime (WORKING!)
            XmlNodeList elementHeader = CurNISTXML.GetElementsByTagName("Header");
            XmlNode curNISTHeader = elementHeader.Item(0);
            XmlAttribute creationTime = curNISTHeader.Attributes[0];  // We now have the creationTime element          
            string CurNISTTime = creationTime.InnerText;  //      //*[@id="mtconnect content"]/ul/li[1]

            // Get availability (WORKING!)
            XmlNodeList nodeAvailability = CurNISTXML.GetElementsByTagName("Availability");
            XmlNode availability = nodeAvailability.Item(0); // I think this is maybe a bit of a hackish / improper way to do this?
            string curNISTStatus = availability.InnerText;

            //Get linear tool X Coord.
            XmlNodeList deviceStream = CurNISTXML.GetElementsByTagName("ComponentStream");
            XmlNode linearCompXStream = deviceStream.Item(4);
            string curNISTX = linearCompXStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within

            //Get Linear tool y Coord.            
            XmlNode linearCompYStream = deviceStream.Item(5);
            string curNISTY = linearCompYStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within


            Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
            Console.WriteLine("NIST Time  : " + creationTime.InnerText);
            Console.WriteLine("NIST Status: " + curNISTStatus);    
            Console.WriteLine("NIST X Pos.: " + curNISTX);
            Console.WriteLine("NIST Y Pos.: " + curNISTY);
            Console.WriteLine("--------END NIST DATA PACKET--------");

            //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
        }

works nicely.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163645

The XPath expression

/table[1]/tbody/tr[1]

will succeed only if the outermost element of the document is a table element, which seems unlikely. I haven't tried to understand the logic of the page or of your code, but this definitely looks wrong. "/" at the start of a path expression selects from the root of the tree.

Upvotes: 1

Related Questions