Polo
Polo

Reputation: 1820

Need help for parsing HTML in C#

For personal use i am trying to parse a little html page that show in a simple grid the result of the french soccer championship.

var Url = "http://www.lfp.fr/mobile/ligue1/resultat.asp?code_jr_tr=J01";
WebResponse result = null;
WebRequest req = WebRequest.Create(Url);
result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding(0);
StreamReader sr = new StreamReader(ReceiveStream, encode);

                while (sr.Read() != -1)
                {
                    Line = sr.ReadLine();
                    Line = Regex.Replace(Line, @"<(.|\n)*?>", " ");
                    Line = Line.Replace("&nbsp;", "");
                    Line = Line.TrimEnd();
                    Line = Line.TrimStart();

and then i really dont have a clue either take line by line or the whole stream at one and how to retreive only the team's name with the next number that would be the score.

At the end i want to put both 2 team's with scores in a liste or xml to use it with an phone application

If anyone has an idea it would be great thanks!

Upvotes: 0

Views: 1571

Answers (4)

JoeCh
JoeCh

Reputation: 1

You could use the Regex.Match method to pull out the team name and score. Examine the html to see how each row is built up. This is a common technique in screen scraping.

Upvotes: 0

Anton Gogolev
Anton Gogolev

Reputation: 115857

You'll need an SgmlReader, which provides an XML-like API over any SGML document (which an HTML document really is).

Upvotes: 0

KV Prajapati
KV Prajapati

Reputation: 94653

Take a look at Html Agility Pack

Upvotes: 7

Neil Barnwell
Neil Barnwell

Reputation: 42165

You could put the stream into an XmlDocument, allowing you to query via something like XPath. Or you could use LINQ to XML with an XDocument.

It's not perfect though, because HTML files aren't always well-formed XML (don't we know it!), but it's a simple solution using stuff already available in the framework.

Upvotes: 1

Related Questions