Dubb
Dubb

Reputation: 433

Loop through large XML file using XDocument

I have to copy nodes from an existing XML file to a newly created XML file. I'm using an XDocument instance to access the existing XML file. The problem is the XML file can be quite large (lets say 500K lines; Openstreetmap data).

What would be the best way to loop through large XML files without causing memory errors?

I currently just use XDocument.Load(path) and loop through doc.Descendants(), but this causes the program to freeze until the loop is done. So I think I have to loop async, but I don't know the best way to achieve this.

Upvotes: 3

Views: 1838

Answers (1)

Fabio
Fabio

Reputation: 32455

You can use XmlReader and IEnumerable<XElement> iterator to yield elements you need.

This approach isn't asynchronous but it saves memory, because you don't need load whole file in the memory for handling. Only elements you select to copy.

public IEnumerable<XElement> ReadFile(string pathToTheFile)
{
    using (XmlReader reader = XmlReader.Create(pathToTheFile))
    {
        reader.MoveToContent();
        while (reader.Read())
        {
            If (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name.Equals("yourElementName"))
                {
                    XElement element = XElement.ReadFrom(reader) as XElement;
                    yield return element ;
                }
            }
        }
    }
}

You can read files asynchronously

public async Task<IEnumerable<XElement>> ReadFileAsync(string pathToTheFile)
{
    var elements = new List<XElement>();
    var xmlSettings = new XmlReaderSettings { Async = true };
    using (XmlReader reader = XmlReader.Create(pathToTheFile, xmlSettings))
    {
        await reader.MoveToContentAsync();
        while (await reader.ReadAsync())
        {
            If (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name.Equals("yourElementName"))
                {
                    XElement element = XElement.ReadFrom(reader) as XElement;
                    elements.Add(element);
                }
            }
        }
    }

    return elements;
}

Then you can loop all files asynchronously and await for the result

var fileTask1 = ReadFileAsync(filePath1);
var fileTask2 = ReadFileAsync(filePath2);
var fileTask3 = ReadFileAsync(filePath3);

await Task.WhenAll(new Task[] { fileTask1, fileTask2, fileTask3} );

// use results
var elementsFromFile1 = fileTask1.Result;

Upvotes: 7

Related Questions