Don_B
Don_B

Reputation: 243

Using exceldatareader to take data from excel file and paste it in a xml file?

I have some xml files in the format

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with OASIS Tables v1.0 20120330//EN" "JATS-journalpublishing-oasis-article1.dtd">
<article article-type="proceedings" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://www.niso.org/standards/z39-96/ns/oasis-exchange/table">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id"/>
<journal-title-group>
<journal-title>Eleventh International Conference on Correlation Optics</journal-title>
</journal-title-group>
<issn pub-type="epub">0277-786X</issn>
<publisher>
<publisher-name>SPIE</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">@@</article-id>
<title-group>
<article-title>Comparison of absorption spectra of adenocarcinoma and squamous cell carcinoma cervical tissue</article-title>
</title-group>
<pub-date>
<year>2013</year>
</pub-date>
<volume>9066</volume>
<issue>90661T</issue>
<isbn content-type="conf-proc">978-1-5106-0374-5</isbn>
<fpage>90661T-1</fpage>
<lpage>90661T-7</lpage>
</article-meta>
</front>

and I have xlsx file like below

Filename        Pages       DOI             Publisher   Year    Price
01022018_1020   14    10.1111/12.2033558    Springer    2017    $15.13 
01022018_1021   7     10.1111/12.2033559    Elsevier    2018    $15.25 
01022018_1130   30    10.1113/20.3023658    Elsevier    2017    $19.25 
01022018_1301   10    10.1113/20.3023777    Wiley       2015    $11.50 

I want to get the respective values of the fields DOI and Publisher for each files(searching by file name) and update them to the xml files i.e. if the file name is 01022018_1021.xml then get its respective DOI and Publisher i.e. 10.1111/12.2033559 and Elsevier and put them inside nodes like <publisher-name>Elsevier</publisher-name> and <article-id pub-id-type="doi">10.1111/12.2033559</article-id>

I've never used exceldatareader before but I tried like this

var xmlFiles=Directory.EnumerateFiles(@"D:\test\testobject","*.xml",SearchOption.AllDirectories);
Parallel.ForEach(xmlFiles,xmlFile=>{
    var name=Path.GetFileName(xmlFile);

    FileStream stream = File.Open(@"C:\Don\Downloads\Download-Sample-File-xlsx.xlsx", FileMode.Open, FileAccess.Read);
    IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);

    DataSet result = excelReader.AsDataSet(new ExcelDataSetConfiguration() {
    ConfigureDataTable = (_) => new ExcelDataTableConfiguration() {
    UseHeaderRow = true
    }
    });

    DataTable dt = result.Tables[0];

    foreach (DataRow row in dt.Rows)
        {
        var x = (string)row["Publisher"];
        var y = (string)row["DOI"];
        /*how do I get the required values
        var pub_name="...";
        var doi="...";
        */
        XDocument doc=XDocument.Load(xmlFile);
        doc.Descendants("pub-name").First().Value=pub_name;
        doc.Descendants("article-id").First().Value=doi;
        doc.Save(xmlFile);

        }
    excelReader.Close();
});

Console.WriteLine("Done");
Console.ReadLine();

Can anyone help?

Upvotes: 0

Views: 2274

Answers (1)

asherber
asherber

Reputation: 2713

It looks like you already have the values.

doc.Descendants("pub-name").First().Value=x;
doc.Descendants("article-id").First().Value=y;

Edit: I didn't catch the bit about needing to find the right row.

DataTable dt = result.Tables[0];

var filename = Path.GetFileNameWithoutExtension(xmlFile);

// Get the row for the current file
var rowForFile = dt.Rows.Cast<DataRow>()
    .SingleOrDefault(r => (string)r["Filename"] == filename);

if (rowForFile != null)
{
    XDocument doc = XDocument.Load(xmlFile);
    doc.Descendants("publisher-name").First().Value = (string)rowForFile["Publisher"];
    doc.Descendants("article-id").First().Value = (string)rowForFile["DOI"];
    doc.Save(xmlFile);
}

Upvotes: 2

Related Questions