Reputation: 149
Based upon a previous written code snippet I'm now trying to store multiple images at once from a certain subreddit into a local directory. My Problem is that I can't get my LINQ statement working properly. I also don't want to download the thumbnail pictures which was why I took a look at the HTML-page and found out that the links I aim to retrieve are hidden in level 5 within the href
attribute:
(...)
Level 1: <div class="content">...</div>
Level 2: <div class="spacer">...</div>
Level 3: <div class="siteTable">...</div>
Level 4: <div class=" thing id-t3_6dj7qp odd link ">...</div>
Level 5: <a class="thumbnail may-blank outbound" href="href="http://i.imgur.com/jZ2ZAyk.jpg"">...</a>
That was my best bet in line '???':
.Where(link => Directory.GetParent(link).Equals(@"http://i.imgur.com"))
Sadly enough it throws out an error stating that
Object reference not set to an instance of an object
Well now I know why it's not working but I've still got no clue how to rewrite this line since I'm still fairly new to Lambda Expressions. To be honest, I don't really know why I got a System.NullReferenceException
in the first place but not in the next line. What's the difference? Maybe my approach on this problem isn't even good practice at all so please let me know how I could proceed further.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Net;
using HtmlAgilityPack;
namespace GetAllImages
{
class Program
{
static void Main(string[] args)
{
List<string> imageLinks = new List<string>();
// Specify Directory manually
string dirName = "Jessica Clements";
string rootPath = @"C:\Users\Stefan\Desktop";
string dirPath = Path.Combine(rootPath, dirName);
// Specify the subReddit manually
string subReddit = "r/Jessica_Clements";
string url = @"https://www.reddit.com/" + subReddit;
try
{
DirectoryInfo imageFolder = Directory.CreateDirectory(dirPath);
HtmlDocument document = new HtmlWeb().Load(url);
imageLinks = document.DocumentNode.Descendants("a")
.Select(element => element.GetAttributeValue("href", null))
.Where(???)
.Where(stringLink => !String.IsNullOrEmpty(stringLink))
.ToList();
foreach(string link in imageLinks)
{
using (WebClient _wc = new WebClient())
{
_wc.DownloadFileAsync(new Uri(link), Path.Combine(dirPath, Path.GetFileName(link)));
}
}
Console.WriteLine($"Files successfully saved in '{Path.GetFileName(dirPath)}'.");
}
catch(Exception e)
{
while(e != null)
{
Console.WriteLine(e.Message);
e = e.InnerException;
}
}
if(System.Diagnostics.Debugger.IsAttached)
{
Console.WriteLine("Press any key to continue . . .");
Console.ReadKey(true);
}
}
}
}
Edit: Just in case someone is interested in this solution that's how I made it work in the end using the answers below:
HtmlDocument document = new HtmlWeb().Load(url);
imageLinks = document.DocumentNode.Descendants("a")
.Select(element => element.GetAttributeValue("href", null))
.Where(link => (link?.Contains(@"http://i.imgur.com") == true))
.Distinct()
.ToList();
Upvotes: 0
Views: 3237
Reputation: 383
if you are trying to get all links pointing to the http://i.imgur.com, you need something like this
imageLinks = document.DocumentNode.Descendants("a")
.Select(element => element.GetAttributeValue("href", null))
.Where(link => link?.Contains(@"http://i.imgur.com") == true)
.ToList();
Upvotes: 1
Reputation: 5737
Given that this line throws the exception:
.Where(link => Directory.GetParent(link).Equals(@"http://i.imgur.com"))
I'd make sure that link
is not null and that the result of GetParent(link)
is not null either. So you could do:
.Where(link => link != null && (Directory.GetParent(link)?.Equals(@"http://i.imgur.com") ?? false))
Notice the null check and the ?.
after GetParent()
. This one stops the execution of the term if null is returned from GetParent()
. It is called the Null Conditional Operator or "Elvis Operator" because it can be seen as two eyes with twirly hair. The ?? false
gives the default value in case the execution was stopped because of a null value.
However, if you plan to parse HTML code you should definitely have a look at the Html Agility Pack (HAP).
Upvotes: 5