Kave
Kave

Reputation: 11

I need to get data from a Website and save it to string

<td class="lineitem">964.00 oz</td>
<td class="lineitem" align="right" bgcolor="#141414"><font color="#33ff66">230.00</td>
<td class="lineitem" align="right">$460</td>      
<td class="lineitem" align="right">1.00</td>
<td class="lineitem" align="right">$2.00</td>

From this I am trying to get: 964.00 , 230.00 , 460 , 1.00 , 2.00 and save them to string to use later on

Thanks in advance

I have tried :

string bleh = ("http://www.drugrunners.net/quickBuySummary.cfm?");
string[] qual = Regex.Split(bleh, "<td class=");
      for (int i = 1; i < qual.Length; i++)
         {
          switch (i)
      {
        case 1:
            Details[0] =  Regex.Split(qual[i], "\">")[0];
               button3.Text = Regex.Split(qual[i], "\">")[1]
         break;

Upvotes: 1

Views: 133

Answers (5)

BRAHIM Kamel
BRAHIM Kamel

Reputation: 13784

what you need is a web scraping tool like HtmlAgilityPack here an example

     HtmlDocument doc = new HtmlDocument();
     doc.Load("http://yourUrl")
    var findclasses = doc.DocumentNode.Descendants("td").Where(d => 
    d.Attributes.Contains("class") && d.Attributes["class"].Contains("lineitem")
);

Upvotes: 1

PaulH
PaulH

Reputation: 3049

The regex pattern can be >\$?([\d\.]+).*<

Meaning:

  • searching something between > and <
  • \$? is the optional $
  • () indicate a subpattern to match, returned as $matches[1]
  • [] indicate the characters to match, the + incidates one or more
  • \d is a digit
  • \. is a dot
  • .* is anything following

In php,

preg_match_all(
   '@>\\$*([\\d\\.]+)<@', 
   '<td class="lineitem">964.00 oz</td>
    <td class="lineitem" align="right" bgcolor="#141414"><font color="#33ff66">230.00</td>
    <td class="lineitem" align="right">$460</td>      
    <td class="lineitem" align="right">1.00</td>
    <td class="lineitem" align="right">$2.00</td>', 
   $matches
);

returns

$matches => array (
  0 => array (
    0 => '>964.00 oz<',
    1 => '>230.00<',
    2 => '>$460<',
    3 => '>1.00<',
    4 => '>$2.00<',
  ),
  1 => array (
    0 => '964.00',
    1 => '230.00',
    2 => '460',
    3 => '1.00',
    4 => '2.00',
  ),
)

Upvotes: 0

Tyress
Tyress

Reputation: 3653

My question is, does what you're doing (in your example) work? Your string bleh line tells me otherwise. You're obviously working with a page that needs authentication, so you can't simply access the document on HtmlDocument.Load. It will be a little harder than doing any of these answers before you can scrape the page. You will need to figure out:

  1. How to do a proper HttpRequest
  2. How to request with authentication (cookies/postdata, whatever it involves).
  3. How to get the response of the page you want and parse it with HtmlAgilityPack

You can look for each of these points separately on S.O. or elsewhere.

You can also take the other path and download the page manually if it works for you, and then you can do a IO.File method to open the document, which you can feed to HtmlDocument.LoadHtml(), meaning you can skip to number 3.

Upvotes: 0

You need to retrieve the remote webpage and 'scrape' it using a library like html agility pack:

About retrieving it, this SO link is useful:

protected void getHtml(string url){
    WebClient client = new WebClient();
    string downloadString = client.DownloadString(url);
    return downloadString;
}

Then once you create a html document out of the string (not sure about how to do it, but should be a no brainer), you can parse it using html agility pack and XPATH;

Ultimately, I think you can get what you want by following this CodeProject tutorial, and you'd get something like this:

protected void ClickMeButton_Click(object sender, EventArgs e){

    var document = getHtml("http://url.to.your/page");
    var tdTags = document.DocumentNode.SelectNodes("//td"); //use a xpath expression to select contents
    int counter = 1;
    if (aTags != null)
    {
        foreach (var aTag in aTags){
        myVarToSave = tdTag.InnerHtml ;
     }
}

Upvotes: 0

M&#229;rten
M&#229;rten

Reputation: 231

new System.Text.RegularExpressions.Regex("^<td class=\"lineitem\".*>(?<number>.*)</td>$")

Will capture 964.00 oz, 230.00, $460, 1.00 and $2.00 respectively, from the lines you posted.

It does require you to run one row at a time, and you will have to figure out what to do with your units as well.

Upvotes: 0

Related Questions