Bumba
Bumba

Reputation: 343

Having trouble making a simple xml file modification program?

I have some XML files that can have some nodes in the structure <funding-source><institution-wrap>...</institution-wrap></funding-source> I want to get the values inside the nodes(if any) and match the values with another XML file namely, funding_info.xml 's node <skos> and if there is a match take the attribute value of its parent node <skd> and then replace the <funding-source><institution-wrap>...</institution-wrap></funding-source> of the main XML file with <funding-source><institution-wrap>...</institution-wrap><fundref-id>The attribute value found</fundref-id></funding-source>. The funding_info.xml looks like below

<?xml version="1.0" encoding="UTF-8"?>
<item>
    <skd id="inst/10.1.3169">
        <skosl>
            <skos>NSF</skos>
        </skosl>
        <skosl>
            <skos>National Science Foundation</skos>
        </skosl>
        <skosl>
            <skos>Jatio Bigyan Songothon</skos>
        </skosl>
    </skd>
    <skd id="inst/10.1.4560">
        <skosl>
            <skos>Massachusetts Institute of Technology</skos>
        </skosl>
        <skosl>
            <skos>MIT</skos>
        </skosl>
        <skosl>
            <skos>Massachusetts Institute of Technology, USA</skos>
        </skosl>
    </skd>
    <skd id="inst/11.2.30213">
        <skosl>
            <skos>European Union</skos>
        </skosl>
        <skosl>
            <skos>European Union</skos>
        </skosl>
        <skosl>
            <skos>European Union FP7 Programme</skos>
        </skosl>
    </skd>
</item>

For example, if the XML file that I want to modify contains some nodes like

<funding-source><institution-wrap>NSF</institution-wrap></funding-source>
<funding-source><institution-wrap>Caltech</institution-wrap></funding-source>
<funding-source><institution-wrap>Massachusetts Institute of Technology, USA</institution-wrap></funding-source>

the output should be

<funding-source><institution-wrap>NSF</institution-wrap><fundref-id>10.1.3169</fundref-id></funding-source>
<funding-source><institution-wrap>Caltech</institution-wrap></funding-source>
<funding-source><institution-wrap>Massachusetts Institute of Technology, USA</institution-wrap><fundref-id>10.1.4560</fundref-id></funding-source>

Since Caltech is not found in any <skos> node in funding_info.xml its value is unchanged. I'm not sure how to approach this but below is what I've tried but got stuck midway

  static void Main(string[] args)
        {
            XDocument doc = XDocument.Load(@"C:\Users\Desktop\my_sample.xml", LoadOptions.PreserveWhitespace);
            var x = doc.Descendants("funding-source").Elements("institution-wrap").Select(a => a.Value).ToArray();
            if (x.Any())
            {
                foreach (var cont in x)
                {
                    XDocument doc2 = XDocument.Load(@"C:\Users\Desktop\funding_info.xml",
                        LoadOptions.PreserveWhitespace);
                    var y = doc2.Descendants("skos").Ancestors("skosl").Ancestors("skd").Attributes("id")
                        .Select(a => a.Value);
                    if (doc2.Descendants("skos").Any().Value(cont))
                    {
                        var y = doc2.Descendants("skos").Ancestors("skosl").Ancestors("skd").Attributes("id")
                            .Select(a => a.Value).First();
............. ...................
............. ..................                        

                    }
                }
            }


            Console.ReadLine();
        }

Upvotes: 0

Views: 52

Answers (1)

Jeff Mercado
Jeff Mercado

Reputation: 134611

Read in your funding_info.xml file and create a mapping between institution names and skd ids. Then with that, you could look through all funding-source elements and check if they already have the id. If not, look in that mapping to see if it has a known value. If it does, add the id.

var fundingDoc = XDocument.Load(pathToFundingInfo);
// creating a lookup since there are multiple instances of the institutions
var skdIds = fundingDoc.Descendants("skd").Elements("skosl")
    .ToLookup(s => (string)s.Element("skos"), s => (string)s.Parent.Attribute("id"));
var outDoc = XDocument.Load(pathToUpdatedFile);
foreach (var f in outDoc.Descendants("funding-source"))
{
    if (f.Element("fundref-id") == null)
    {
        var name = (string)f.Element("institution-wrap");
        var skd = skdIds[name].FirstOrDefault(); // just take the first one
        if (skd != null)
            f.Add(new XElement("fundref-id", skd.Substring("inst/".Length)));
    }
}
outDoc.Save(pathToUpdatedFile);

This should produce output like this:

<root>
  <funding-source>
    <institution-wrap>NSF</institution-wrap>
    <fundref-id>10.1.3169</fundref-id>
  </funding-source>
  <funding-source>
    <institution-wrap>Caltech</institution-wrap>
  </funding-source>
  <funding-source>
    <institution-wrap>Massachusetts Institute of Technology, USA</institution-wrap>
    <fundref-id>10.1.4560</fundref-id>
  </funding-source>
</root>

If you want to make this case-insensitive, make the keys of the lookup all upper- or lower-case.

// ...
var skdIds = fundingDoc.Descendants("skd").Elements("skosl")
    .ToLookup(s => s.Element("skos").Value.ToUpperInvariant(), s => (string)s.Parent.Attribute("id"));
// ...
        var name = f.Element("institution-wrap").Value.ToUpperInvariant();
// ...

Upvotes: 1

Related Questions