Reputation: 91
Basically what I want to do is get text from a HTML web document,
<a href="showthread.php?tid=2632829">1</a>
<a href="showthread.php?tid=2342818">1</a>
<a href="showthread.php?tid=2342818">1</a>
<a href="showthread.php?tid=2342818">1</a>
....
....
All these link are in different lines and a lot of other scripts in between them.
right now the catch is I want to search for "1</a>
" in these documents and get the link
showthread.php?tid=11digitnumber
I then want to place them in a richtextbox line by line say
showthread.php?tid=11digitnumber
showthread.php?tid=11digitnumber
showthread.php?tid=11digitnumber
...
What I have done so far is got the source of webpage using
source = WebBrowser1.DocumentText.ToString()
Earlier I had some luck using
dim ss,variable as string
variable = ss.Substring(ss.LastIndexOfAny(">1</a> ") - 27, 27)
output:
showthread.php?tid=11digitnumber
but I am only able to use this once,besides there are many such files in the document
Upvotes: 1
Views: 686
Reputation: 2674
you just have to play with a bit of logic like:
myOriginPoint = your starting point (usually 0)
myLastOccurrence = your last point (usually with LastIndexOf)
then you can use a loop and a temporal list like:
List<String> urls = new List<String>();
while(myOriginPoint < myLastOccurrence )
{
//retrieve the keyword
var urlFound = your logic to retrieve the url
//save the keyword
urls.Add(urlFound);
//move to next position
myOriginPoint = indexOf +1;
}
By the way, you can also use WebClient in .Net, si much better to retrieve data from a url: http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx
I hope it helps,
Upvotes: 1