Extracting data from an XML document without using an XML parser

Question

Here's some lines of the document:

  
    Technical Fouls

    
       
         
            Players
          
           
            DAL
            
              None

          
           
            MIA
            
              Mike Miller
            
              Mike Miller, Jr.

I'm interested in extracting the None and Mike Miller and Mike Miller, Jr. from this. I tried using various XML parsers, but 1) the performance is abysmal and 2) the document is apparently not a properly formatted XML document.

One thing I've been thinking about is stripping the document of newlines, splitting it at something like , seeing which lines contain data (probably using StartsWith()), and extracting it with a regex. That would be efficient enough for my program (doesn't really matter that it takes half a second when downloading the document is five seconds), but I'm interested it alternative solutions.

Paul Creasey · Accepted Answer

Relevant

HTML generally isn't properly formatted XML, I suggest you use something like the HTML Agility pack

Extracting data from an XML document without using an XML parser

Answers (2)

Related Questions