Reputation: 369
What I'm doing here is converting the omnipage xml to alto xml. So I decided to use C#.
And here is my sample XML file
<wd l="821" t="283" r="1363" b="394">
<ch l="821" t="312" r="878" b="394" conf="158">n</ch>
<ch l="888" t="312" r="950" b="394" conf="158">o</ch>
<ch l="955" t="283" r="979" b="394" conf="158">i</ch>
<ch l="989" t="312" r="1046" b="394" conf="158">e</ch>
<ch l="1051" t="312" r="1147" b="394" conf="158">m</ch>
<ch l="1157" t="283" r="1219" b="394" conf="158">b</ch>
<ch l="1224" t="312" r="1267" b="394" conf="198">r</ch>
<ch l="1267" t="283" r="1296" b="394" conf="198">i</ch>
<ch l="1306" t="312" r="1363" b="394" conf="158">e</ch>
</wd>
And here is my code
XDocument document = XDocument.Load(fileName);
var coordinates = from r in document.Descendants("wd").ToList().Where
(r => (string)r.Attribute("l") != "")
select new
{
left = r.Attribute("l").Value,
};
foreach (var item in coordinates)
{
Console.WriteLine(item.left);
}
Console.ReadLine();
My question is, it works when I use a simple XML like in the above, but when I use a long XML like this in the link
http://pastebin.com/LmDHRzC5
it doesn't work?
But it also has a wd
tag and it also has a L
attribute.
Thank you. I paste the long XML in the pastebin because its too long.
Upvotes: 2
Views: 152
Reputation: 34421
I'm not going to do all the code but this should get you started. I used xml linq
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = @"c:\temp\test.xml";
static void Main(string[] args)
{
StreamReader reader = new StreamReader(FILENAME);
//skip xml identification with UTF-16
reader.ReadLine();
XDocument doc = XDocument.Load(reader);
XElement body = doc.Descendants().Where(x => x.Name.LocalName == "body").FirstOrDefault();
XNamespace ns = body.GetDefaultNamespace();
var results = new {
sections = body.Elements(ns + "section").Select(x => new {
l = (int)x.Attribute("l"),
r = (int)x.Attribute("r"),
b = (int)x.Attribute("b"),
runs = x.Descendants(ns + "run").Select(y => new {
wds = y.Elements(ns + "wd").Select(z => new {
chs = z.Elements(ns + "ch").Select(a => new {
l = (int?)a.Attribute("l"),
t = (int?)a.Attribute("t"),
r = (int?)a.Attribute("r"),
b = (int?)a.Attribute("b"),
conf = (int?)a.Attribute("conf"),
value = (string)a
}).ToList()
}).ToList()
}).ToList()
}).ToList(),
dds = body.Elements(ns + "dd").Select(x => new {
l = (int)x.Attribute("l"),
r = (int)x.Attribute("r"),
b = (int)x.Attribute("b"),
paras = x.Elements(ns + "para").Select(y => new {
lns = y.Elements(ns + "ln").Select(z => new {
wds = z.Elements(ns + "wd").Select(a => new {
chs = a.Elements(ns + "ch").Select(b => new {
l = (int?)b.Attribute("l"),
t = (int?)b.Attribute("t"),
r = (int?)b.Attribute("r"),
b = (int?)b.Attribute("b"),
conf = (int?)b.Attribute("conf"),
value = (string)b
}).ToList()
}).ToList()
}).ToList()
}).ToList()
}).ToList(),
};
}
}
}
Upvotes: 1
Reputation: 68687
You have a namespace on your larger document
<document xmlns="http://www.scansoft.com/omnipage/xml/ssdoc-schema3.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
the following works
document.Descendants().Where(e => e.Name.LocalName == "wd")
Or you can use another option from Search XDocument using LINQ without knowing the namespace
Upvotes: 2