Reputation: 343
I have some XML files that can have some nodes in the structure <funding-source><institution-wrap>...</institution-wrap></funding-source>
I want to get the values inside the nodes(if any) and match the values with another XML file namely, funding_info.xml 's node <skos>
and if there is a match take the attribute value of its parent node <skd>
and then
replace the <funding-source><institution-wrap>...</institution-wrap></funding-source>
of the main XML file with <funding-source><institution-wrap>...</institution-wrap><fundref-id>The attribute value found</fundref-id></funding-source>
.
The funding_info.xml looks like below
<?xml version="1.0" encoding="UTF-8"?>
<item>
<skd id="inst/10.1.3169">
<skosl>
<skos>NSF</skos>
</skosl>
<skosl>
<skos>National Science Foundation</skos>
</skosl>
<skosl>
<skos>Jatio Bigyan Songothon</skos>
</skosl>
</skd>
<skd id="inst/10.1.4560">
<skosl>
<skos>Massachusetts Institute of Technology</skos>
</skosl>
<skosl>
<skos>MIT</skos>
</skosl>
<skosl>
<skos>Massachusetts Institute of Technology, USA</skos>
</skosl>
</skd>
<skd id="inst/11.2.30213">
<skosl>
<skos>European Union</skos>
</skosl>
<skosl>
<skos>European Union</skos>
</skosl>
<skosl>
<skos>European Union FP7 Programme</skos>
</skosl>
</skd>
</item>
For example, if the XML file that I want to modify contains some nodes like
<funding-source><institution-wrap>NSF</institution-wrap></funding-source>
<funding-source><institution-wrap>Caltech</institution-wrap></funding-source>
<funding-source><institution-wrap>Massachusetts Institute of Technology, USA</institution-wrap></funding-source>
the output should be
<funding-source><institution-wrap>NSF</institution-wrap><fundref-id>10.1.3169</fundref-id></funding-source>
<funding-source><institution-wrap>Caltech</institution-wrap></funding-source>
<funding-source><institution-wrap>Massachusetts Institute of Technology, USA</institution-wrap><fundref-id>10.1.4560</fundref-id></funding-source>
Since Caltech is not found in any <skos>
node in funding_info.xml its value is unchanged.
I'm not sure how to approach this but below is what I've tried but got stuck midway
static void Main(string[] args)
{
XDocument doc = XDocument.Load(@"C:\Users\Desktop\my_sample.xml", LoadOptions.PreserveWhitespace);
var x = doc.Descendants("funding-source").Elements("institution-wrap").Select(a => a.Value).ToArray();
if (x.Any())
{
foreach (var cont in x)
{
XDocument doc2 = XDocument.Load(@"C:\Users\Desktop\funding_info.xml",
LoadOptions.PreserveWhitespace);
var y = doc2.Descendants("skos").Ancestors("skosl").Ancestors("skd").Attributes("id")
.Select(a => a.Value);
if (doc2.Descendants("skos").Any().Value(cont))
{
var y = doc2.Descendants("skos").Ancestors("skosl").Ancestors("skd").Attributes("id")
.Select(a => a.Value).First();
............. ...................
............. ..................
}
}
}
Console.ReadLine();
}
Upvotes: 0
Views: 52
Reputation: 134611
Read in your funding_info.xml file and create a mapping between institution names and skd ids. Then with that, you could look through all funding-source elements and check if they already have the id. If not, look in that mapping to see if it has a known value. If it does, add the id.
var fundingDoc = XDocument.Load(pathToFundingInfo);
// creating a lookup since there are multiple instances of the institutions
var skdIds = fundingDoc.Descendants("skd").Elements("skosl")
.ToLookup(s => (string)s.Element("skos"), s => (string)s.Parent.Attribute("id"));
var outDoc = XDocument.Load(pathToUpdatedFile);
foreach (var f in outDoc.Descendants("funding-source"))
{
if (f.Element("fundref-id") == null)
{
var name = (string)f.Element("institution-wrap");
var skd = skdIds[name].FirstOrDefault(); // just take the first one
if (skd != null)
f.Add(new XElement("fundref-id", skd.Substring("inst/".Length)));
}
}
outDoc.Save(pathToUpdatedFile);
This should produce output like this:
<root>
<funding-source>
<institution-wrap>NSF</institution-wrap>
<fundref-id>10.1.3169</fundref-id>
</funding-source>
<funding-source>
<institution-wrap>Caltech</institution-wrap>
</funding-source>
<funding-source>
<institution-wrap>Massachusetts Institute of Technology, USA</institution-wrap>
<fundref-id>10.1.4560</fundref-id>
</funding-source>
</root>
If you want to make this case-insensitive, make the keys of the lookup all upper- or lower-case.
// ...
var skdIds = fundingDoc.Descendants("skd").Elements("skosl")
.ToLookup(s => s.Element("skos").Value.ToUpperInvariant(), s => (string)s.Parent.Attribute("id"));
// ...
var name = f.Element("institution-wrap").Value.ToUpperInvariant();
// ...
Upvotes: 1