Reputation: 1953
A colleague at work is having an issue with a very unusual XML file when trying to query it and after trying to help him, me and the other guys are in a bit of a creative block....look at this, it might interest many people here....
Structure:
<Root>
<MainFoo>
<Foo>
<A bla="bla" />
<B bla1="blablabla" />
<C bla2="blabla" />
<Bar N="Education" V="Some Text" />
<Bar N="Other Node" V="Some other Text" />
<Bar N="Yet Other Node" V="Some other other Text" />
<Bar N="fourth Bar Node" V="Some other other otherText" />
<Bar N="UserID" V="1" />
</Foo>
<Foo>
<A bla="bla" />
<B bla1="blablabla" />
<C bla2="blabla" />
<Bar N="Education" V="Specific Text" />
<Bar N="Other Node" V="Some other Text" />
<Bar N="Yet Other Node" V="Some other other Text" />
<Bar N="fourth Bar Node" V="Some other other otherText" />
<Bar N="UserID" V="2" />
</Foo>
<Foo>
<A bla="bla" />
<B bla1="blablabla" />
<C bla2="blabla" /> <!--***No Bar node with N="Education" in this Foo Node, not a mistake! this might be part of the problem but this is the XML Structure and can't be changed***-->
<Bar N="Other Node" V="Some other Text" />
<Bar N="Yet Other Node" V="Some other other Text" />
<Bar N="fourth Bar Node" V="Some other other otherText" />
<Bar N="UserID" V="3" />
</Foo>
<Foo>
<A bla="bla" />
<B bla1="blablabla" />
<C bla2="blabla" />
<Bar N="Education" V="Specific Text" />
<Bar N="Other Node" V="Some other Text" />
<Bar N="Yet Other Node" V="Some other other Text" />
<Bar N="fourth Bar Node" V="Some other other otherText" />
<Bar N="UserID" V="4" />
</Foo>
</MainFoo>
<OtherMainFoo></OtherMainFoo>
<MoreMainFoo></MoreMainFoo>
</Root>
OK, Now for the issue at hand: we are trying with LINQ to XML to get every User ID value For Every user Node into a string for every Foo Element IF There is a Bar node in this Foo and the N attribute of this Bar node is "Education" and only if this bar node with attribute education has the V with a value that not contains words that we specify in the LINQ
For example, if we want all the user IDs for Foo nodes with education don't contain the word "Some" we will get the result of 2,4 because Foo number one had a Bar node with education value for the N attribute but it has the Some string in the V Attribute and Foo number 3 Don't have a Bar Node with an Education value in it's N attribute (Very important, because we think this is one of the reasons that we are getting empty result all the time what ever we do).
Any LINQ to XML expert here that have an idea, this is a very unusual scenario for an XML but that is was we have to deal with and i thing that this question will interest many people over here.
Upvotes: 0
Views: 171
Reputation: 236218
string text = "Some";
var query = from foo in xdoc.Descendants("Foo")
let user = foo.Element("User")
where user != null &&
foo.Elements("Bar")
.Any(bar => (string)bar.Attribute("N") == "Education" &&
!Regex.IsMatch((string)bar.Attribute("V"), text,
RegexOptions.IgnoreCase))
select (int)user.Attribute("ID");
// result: 2, 4
I used regular expression to search word in attribute of bar for two reasons - to make search case insensitive, and to take care of cases when Bar
element does not have V
attribute. Also you can change pattern to match word (not part of word).
If all Foo
nodes have User
element, you can remove null check for user. Also if Bar
elements always contain V
attribute, and you don't need case-insensitive search, then query could be simplified:
var query = from foo in xdoc.Descendants("Foo")
where foo.Elements("Bar")
.Any(bar => (string)bar.Attribute("N") == "Education" &&
!((string)bar.Attribute("V")).Contains(text))
select (int)foo.Element("User").Attribute("ID");
Upvotes: 2
Reputation: 70523
tl; dr :
var hasEducation = contacts.Elements("MainFoo").Elements("Foo")
.Where(foo => foo.Elements("Bar")
.Any(bar => (bar.Attribute("N").Value == "Education") &&
(!bar.Attribute("V").Value.ToLower().Contains("some") )))
Note: I tested this with LinqPad (http://www.linqpad.net/) use it and love it. LinqPad is perfect for these problems. Below is the full source for a LinqPad query to test and play yourself.
The main where is working on an element of foo. Then it checks the elements (specifically the "Bar" elements and their attributes) for the rules you wish to apply.
The key question here is how maintainable is this type of query. Will you be able to maintain a linq query like this? Try working with LinqPad -- I believe it will make the modification and development of these queries easier for you (or anyone.)
To get a list of user ids (as John's answer) you can just add
.Element("User").Attribute("ID").Value;
to the end of the query above.
Of course that does not include John's sexy error checking.
XElement contacts = XElement.Parse (@"
<Root>
<MainFoo>
<Foo>
<A bla='bla' />
<B bla1='blablabla' />
<C bla2='blabla' />
<Bar N='Education' V='Some Text' />
<Bar N='Other Node' V='Some other Text' />
<Bar N='Yet Other Node' V='Some other other Text' />
<Bar N='fourth Bar Node' V='Some other other otherText' />
<User ID='1' />
</Foo>
<Foo>
<A bla='bla' />
<B bla1='blablabla' />
<C bla2='blabla' />
<Bar N='Education' V='Specific Text' />
<Bar N='Other Node' V='Some other Text' />
<Bar N='Yet Other Node' V='Some other other Text' />
<Bar N='fourth Bar Node' V='Some other other otherText' />
<User ID='2' />
</Foo>
<Foo>
<A bla='bla' />
<B bla1='blablabla' />
<C bla2='blabla' /> <!--***No Bar node with N='Education' in this Foo Node, not a mistake! this might be part of the problem but this is the XML Structure and can't be changed***-->
<Bar N='Other Node' V='Some other Text' />
<Bar N='Yet Other Node' V='Some other other Text' />
<Bar N='fourth Bar Node' V='Some other other otherText' />
<User ID='3' />
</Foo>
<Foo>
<A bla='bla' />
<B bla1='blablabla' />
<C bla2='blabla' />
<Bar N='Education' V='Specific Text' />
<Bar N='Other Node' V='Some other Text' />
<Bar N='Yet Other Node' V='Some other other Text' />
<Bar N='fourth Bar Node' V='Some other other otherText' />
<User ID='4' />
</Foo>
</MainFoo>
<OtherMainFoo></OtherMainFoo>
<MoreMainFoo></MoreMainFoo>
</Root>");
var hasEducation = contacts.Elements("MainFoo").Elements("Foo")
.Where(foo => foo.Elements("Bar")
.Any(bar => (bar.Attribute("N").Value == "Education") &&
(!bar.Attribute("V").Value.ToLower().Contains("some") )))
.Dump();
Upvotes: 2
Reputation: 2011
To keep your options open, here's a solution that uses XPath instead of LINQ. This doesn't include the error checking as per John's answer, but it works all the same.
public static IEnumerable<string> GetIDs(XDocument doc, string negation)
{
//The following xpath string will select all Foo elements that contain a Bar child
// that has a N attribute with the value "Education" and also has a V attribute
// that does not contain the specified string.
string xPathString = String.Format("//Foo[(Bar/@N = 'Education') and (not(contains(Bar/@V, '{0}')))]", negation);
return doc.Root
.XPathSelectElements(xPathString) //Select the proper Foo elements
.Select(a => a.Element("User").Attribute("ID").Value); //Grab the User elements under the previous Foo elements and return their ID attribute value
}
Upvotes: 2
Reputation: 161773
The following seems to work:
public static IEnumerable<int> QueryComplexXml()
{
var doc = XDocument.Parse(XML);
if (doc.Root == null)
{
throw new System.InvalidOperationException("No root");
}
var mainFoo = doc.Root.Element("MainFoo");
if (mainFoo == null)
{
throw new System.InvalidOperationException("No MainFoo");
}
var userIDs = from foo in mainFoo.Elements("Foo")
where
foo.Elements("Bar")
.Any(
bar =>
bar.Attribute("N").Value == "Education" &&
bar.Attribute("V").Value == "Specific Text")
let user = foo.Element("User")
where user != null
select int.Parse(user.Attribute("ID").Value);
return userIDs;
}
The code considers all of the "Foo" elements, but only those where there is a "Bar" element which has an "N" attribute of "Education" and a "V" attribute of "Specific Text" (you can put any predicate you want right there). For each of those selected elements, it pulls out the "User" elements (assuming there is one, and parses out and returns the "ID" attribute.
In the example XML you posted, this returns 2 and 4.
Upvotes: 1