Reputation: 11
<td class="lineitem">964.00 oz</td>
<td class="lineitem" align="right" bgcolor="#141414"><font color="#33ff66">230.00</td>
<td class="lineitem" align="right">$460</td>
<td class="lineitem" align="right">1.00</td>
<td class="lineitem" align="right">$2.00</td>
From this I am trying to get: 964.00 , 230.00 , 460 , 1.00 , 2.00 and save them to string to use later on
Thanks in advance
I have tried :
string bleh = ("http://www.drugrunners.net/quickBuySummary.cfm?");
string[] qual = Regex.Split(bleh, "<td class=");
for (int i = 1; i < qual.Length; i++)
{
switch (i)
{
case 1:
Details[0] = Regex.Split(qual[i], "\">")[0];
button3.Text = Regex.Split(qual[i], "\">")[1]
break;
Upvotes: 1
Views: 133
Reputation: 13784
what you need is a web scraping tool like HtmlAgilityPack
here an example
HtmlDocument doc = new HtmlDocument();
doc.Load("http://yourUrl")
var findclasses = doc.DocumentNode.Descendants("td").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Contains("lineitem")
);
Upvotes: 1
Reputation: 3049
The regex pattern can be >\$?([\d\.]+).*<
Meaning:
In php,
preg_match_all(
'@>\\$*([\\d\\.]+)<@',
'<td class="lineitem">964.00 oz</td>
<td class="lineitem" align="right" bgcolor="#141414"><font color="#33ff66">230.00</td>
<td class="lineitem" align="right">$460</td>
<td class="lineitem" align="right">1.00</td>
<td class="lineitem" align="right">$2.00</td>',
$matches
);
returns
$matches => array (
0 => array (
0 => '>964.00 oz<',
1 => '>230.00<',
2 => '>$460<',
3 => '>1.00<',
4 => '>$2.00<',
),
1 => array (
0 => '964.00',
1 => '230.00',
2 => '460',
3 => '1.00',
4 => '2.00',
),
)
Upvotes: 0
Reputation: 3653
My question is, does what you're doing (in your example) work? Your string bleh
line tells me otherwise. You're obviously working with a page that needs authentication, so you can't simply access the document on HtmlDocument.Load. It will be a little harder than doing any of these answers before you can scrape the page. You will need to figure out:
You can look for each of these points separately on S.O. or elsewhere.
You can also take the other path and download the page manually if it works for you, and then you can do a IO.File method to open the document, which you can feed to HtmlDocument.LoadHtml(), meaning you can skip to number 3.
Upvotes: 0
Reputation: 494
You need to retrieve the remote webpage and 'scrape' it using a library like html agility pack:
About retrieving it, this SO link is useful:
protected void getHtml(string url){
WebClient client = new WebClient();
string downloadString = client.DownloadString(url);
return downloadString;
}
Then once you create a html document out of the string (not sure about how to do it, but should be a no brainer), you can parse it using html agility pack and XPATH;
Ultimately, I think you can get what you want by following this CodeProject tutorial, and you'd get something like this:
protected void ClickMeButton_Click(object sender, EventArgs e){
var document = getHtml("http://url.to.your/page");
var tdTags = document.DocumentNode.SelectNodes("//td"); //use a xpath expression to select contents
int counter = 1;
if (aTags != null)
{
foreach (var aTag in aTags){
myVarToSave = tdTag.InnerHtml ;
}
}
Upvotes: 0
Reputation: 231
new System.Text.RegularExpressions.Regex("^<td class=\"lineitem\".*>(?<number>.*)</td>$")
Will capture 964.00 oz, 230.00, $460, 1.00 and $2.00 respectively, from the lines you posted.
It does require you to run one row at a time, and you will have to figure out what to do with your units as well.
Upvotes: 0