Reputation:
private void ParseFilesNames()
{
using (WebClient client = new WebClient())
{
try
{
for (int i = 0; i < 15; i++)
{
string urltoparse = "mysite.com/gallery/albums/from_old_gallery/" + i;
string s = client.DownloadString(urltoparse);
int index = -1;
while (true)
{
string firstTag = "HREF=";
string secondtag = ">";
index = s.IndexOf(firstTag, 0);
int endIndex = s.IndexOf(secondtag, index);
if (index < 0)
{
break;
}
else
{
string filename = s.Substring(index + firstTag.Length, endIndex - index - firstTag.Length);
}
}
}
}
catch (Exception err)
{
}
}
}
The problem is with the Substring. index + firstTag.Length, endIndex - index - firstTag.Length This is wrong.
What I need to get is the string between: HREF="
and ">
The whole string looks like: HREF="myimage.jpg">
I need to get only "myimage.jpg"
And sometimes it can be "myimage465454.jpg" so in any case I need to get only the file name. Only "myimage465454.jpg".
What should I change in the substring?
Upvotes: 1
Views: 737
Reputation: 6286
If you are sure that your string will always be < HREF="yourpath" > , just apply the following:
string yourInitialString = @"HREF="myimage.jpg"";
string parsedString = yourInitialString.Replace(@"<HREF="").Replace(@"">");
If you need to parse HTML links href values, the best option will be using HtmlAgilityPack library.
Solution with Html Agility Pack :
HtmlWeb htmlWeb = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = htmlWeb.Load(Url);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
// Get the value of the HREF attribute
string hrefValue = link.GetAttributeValue( "href", string.Empty );
}
To install HtmlAgilityPack, run the following command in the Package Manager Console:
PM> Install-Package HtmlAgilityPack
Hope it helps.
Upvotes: 3
Reputation: 127
Try this:
String filename = input.split("=")[1].replace("\"","").replace(">","");
Upvotes: 0