Reputation: 29
I need to get every single file from a URL so then I can iterate over them.
The idea is to resize each image using ImageMagick, but first I need to be able to get the files and iterate over them.
Here is the code I have done so far
using System;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace Example
{
public class MyExample
{
public static void Main(String[] args)
{
string url = "https://www.paz.cl/imagenes_cotizador/BannerPrincipal/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string html = reader.ReadToEnd();
Console.WriteLine(html);
}
}
Console.ReadLine();
}
}
}
Which returns the entire html of the URL. However, I just need the files (all images) so I can work with them As I expect.
Any idea how to achieve this?
Upvotes: 0
Views: 65
Reputation: 10682
I looked at that page, and it's a directory/file list. You can use Regex to extract all links to images from the body of that page.
Here's a pattern I could think of: HREF="([^"]+\.(jpg|png))
Build your regex object, iterate over the matches, and download each image:
var regex = new System.Text.RegularExpressions.Regex("HREF=\"([^\"]+\\.(jpg|png))");
var matches = regex.Matches(html); // this is your html string
foreach(var match in matches) {
var imagePath = match.ToString().Substring("HREF=\"".Length);
Console.WriteLine(imagePath);
}
Now, concatenate the base url https://www.paz.cl
with the image relative path obtained above, issue another request to that url to download the image and process it as you wish.
Upvotes: 1
Reputation: 21337
You can use AngleSharp to load and parse the html page. Then you can extract all the information you need.
// TODO add a reference to NuGet package AngleSharp
private static async Task Main(string[] args)
{
var config = Configuration.Default.WithDefaultLoader();
var address = "https://www.paz.cl/imagenes_cotizador/BannerPrincipal";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var images = document.Images.Select(img=>img.Source);
}
AngleSharp implements the w3c standard, so it works better than HTMLAgilityPack on real world webpage.
Upvotes: 0
Reputation: 1939
You can use The HTML Agility Pack
for example
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//a");
foreach (var node in htmlNodes)
{
Console.WriteLine(node.Attributes["href"].Value);
}
Upvotes: 0