Reputation: 243
I have some xml files in the format
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with OASIS Tables v1.0 20120330//EN" "JATS-journalpublishing-oasis-article1.dtd">
<article article-type="proceedings" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://www.niso.org/standards/z39-96/ns/oasis-exchange/table">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id"/>
<journal-title-group>
<journal-title>Eleventh International Conference on Correlation Optics</journal-title>
</journal-title-group>
<issn pub-type="epub">0277-786X</issn>
<publisher>
<publisher-name>SPIE</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">@@</article-id>
<title-group>
<article-title>Comparison of absorption spectra of adenocarcinoma and squamous cell carcinoma cervical tissue</article-title>
</title-group>
<pub-date>
<year>2013</year>
</pub-date>
<volume>9066</volume>
<issue>90661T</issue>
<isbn content-type="conf-proc">978-1-5106-0374-5</isbn>
<fpage>90661T-1</fpage>
<lpage>90661T-7</lpage>
</article-meta>
</front>
and I have xlsx file like below
Filename Pages DOI Publisher Year Price
01022018_1020 14 10.1111/12.2033558 Springer 2017 $15.13
01022018_1021 7 10.1111/12.2033559 Elsevier 2018 $15.25
01022018_1130 30 10.1113/20.3023658 Elsevier 2017 $19.25
01022018_1301 10 10.1113/20.3023777 Wiley 2015 $11.50
I want to get the respective values of the fields DOI and Publisher for each files(searching by file name) and update them to the xml files i.e. if the file name is 01022018_1021.xml then get its respective DOI and Publisher i.e. 10.1111/12.2033559 and Elsevier and put them inside nodes like <publisher-name>Elsevier</publisher-name>
and <article-id pub-id-type="doi">10.1111/12.2033559</article-id>
I've never used exceldatareader before but I tried like this
var xmlFiles=Directory.EnumerateFiles(@"D:\test\testobject","*.xml",SearchOption.AllDirectories);
Parallel.ForEach(xmlFiles,xmlFile=>{
var name=Path.GetFileName(xmlFile);
FileStream stream = File.Open(@"C:\Don\Downloads\Download-Sample-File-xlsx.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
DataSet result = excelReader.AsDataSet(new ExcelDataSetConfiguration() {
ConfigureDataTable = (_) => new ExcelDataTableConfiguration() {
UseHeaderRow = true
}
});
DataTable dt = result.Tables[0];
foreach (DataRow row in dt.Rows)
{
var x = (string)row["Publisher"];
var y = (string)row["DOI"];
/*how do I get the required values
var pub_name="...";
var doi="...";
*/
XDocument doc=XDocument.Load(xmlFile);
doc.Descendants("pub-name").First().Value=pub_name;
doc.Descendants("article-id").First().Value=doi;
doc.Save(xmlFile);
}
excelReader.Close();
});
Console.WriteLine("Done");
Console.ReadLine();
Can anyone help?
Upvotes: 0
Views: 2274
Reputation: 2713
It looks like you already have the values.
doc.Descendants("pub-name").First().Value=x;
doc.Descendants("article-id").First().Value=y;
Edit: I didn't catch the bit about needing to find the right row.
DataTable dt = result.Tables[0];
var filename = Path.GetFileNameWithoutExtension(xmlFile);
// Get the row for the current file
var rowForFile = dt.Rows.Cast<DataRow>()
.SingleOrDefault(r => (string)r["Filename"] == filename);
if (rowForFile != null)
{
XDocument doc = XDocument.Load(xmlFile);
doc.Descendants("publisher-name").First().Value = (string)rowForFile["Publisher"];
doc.Descendants("article-id").First().Value = (string)rowForFile["DOI"];
doc.Save(xmlFile);
}
Upvotes: 2