Reputation: 1966
I am looking for specific items on a web page. What I did (to test, so far) is working just fine, but is really ugly to my eyes. I would like to get suggestions to do this in a more concise manner, that is ONE Linq query instead of 2 now....
document.GetXDocument();
string xmlns = "{http://www.w3.org/1999/xhtml}";
var AllElements = from AnyElement in document.fullPage.Descendants(xmlns + "div")
where AnyElement.Attribute("id") != null && AnyElement.Attribute("id").Value == "maincolumn"
select AnyElement;
// this first query bring only one LARGE Element.
XDocument subdocument = new XDocument(AllElements);
var myElements = from item in subdocument.Descendants(xmlns + "img")
where String.IsNullOrEmpty(item.Attribute("src").Value.Trim()) != true
select item;
foreach (var element in myElements)
{
Console.WriteLine(element.Attribute("src").Value.Trim());
}
Assert.IsNotNull(myElements.Count());
I know I could directly look for "img", but I want to be able to get other types of items in those pages, like links and some text.
I strongly doubt this is the best way!
Upvotes: 3
Views: 1504
Reputation: 15357
If you insist on parsing the web page as XML, try this:
var elements =
from element in document.Descendants(xmlns + "div")
where (string)element.Attribute("id") == "maincolumn"
from element2 in element.Descendants(xmlns + "img")
let src = ((string)element2.Attribute("src")).Trim()
where String.IsNullOrEmpty(src)
select new {
element2,
src
};
foreach (var item in elements) {
Console.WriteLine(item.src);
}
Notes:
document
? I am assuming it's an XDocument
. If that is the case, you can use Descendants
directly on XDocument
. (OTOTH if document
is an XDocument
, where does that fullPath
property come from?)XAttribute
to a string. If it's empty, the result of the cast will be null. This will save on the double check. (This doesn't offer any performance benefits.)let
to "save" a value for later reuse, in this case for use in the foreach. Unless all you need is that final Assert, in which case it might be more efficient to use Any
instead of Count
. Any
only has to iterate over the first result in order to return a value; Count
has to iterate over all of them.subdocument
of type XDocument
? Wouldn't XElement
be the appropriate type?String.IsNullOrWhitespace
to check for whitespace in src
, instead of String.IsNullOrEmpty
, assuming you want to process the src
as is, with any whitespace it might have.Upvotes: 0
Reputation: 9214
The same logic in single query:
var myElements = from element in document.fullPage.Descendants(xmlns + "div")
where element.Attribute("id") != null
&& element.Attribute("id").Value == "maincolumn"
from item in new XDocument(element).Descendants(xmlns + "img")
where !String.IsNullOrEmpty(item.Attribute("src").Value.Trim())
select item;
Upvotes: 1