Jonathan Underwood
Jonathan Underwood

Reputation: 19

Removing duplicate nodes within an Xml file

I'm needing to delete duplicate fields from an xml document. I've got this linq query that collects all the duplicates based on if there are more than one matching id attribute.

Code:

var xtra =
           xdoc.Descendants("Field")
           .GroupBy(g => (string)g.Attribute("id"))
           .Where(g => g.Count() > 1)
           .Select(g => g.Key)
           .ToList();

Now I'm having trouble removing only one occurrence of the ids. Currently the way that I'm removing it removes all of the occurrences, so instead of removing duplicates it removes both duplicate and the first occurrence.

Any idea how to do this with a linq query?

Upvotes: 1

Views: 1698

Answers (2)

jdweng
jdweng

Reputation: 34429

You need Attribute("id").Value for your code to work. Try this

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string input =
            "<?xml version=\"1.0\" encoding=\"utf-8\" ?>" +
            "<Root>" +
              "<Field id =\"1\"></Field>" +
              "<Field id =\"2\"></Field>" +
              "<Field id =\"3\"></Field>" +
              "<Field id =\"1\"></Field>" +
              "<Field id =\"2\"></Field>" +
              "<Field id =\"3\"></Field>" +
              "<Field id =\"4\"></Field>" +
              "<Field id =\"2\"></Field>" +
              "<Field id =\"3\"></Field>" +
              "<Field id =\"4\"></Field>" +
            "</Root>";

            XDocument xdoc = XDocument.Parse(input);

            var xtra =
                xdoc.Descendants("Field")
                .GroupBy(g => g.Attribute("id").Value)
                .Select(x => x.FirstOrDefault())
                .ToList();

        }
    }
}
​

Upvotes: 0

har07
har07

Reputation: 89305

You can use Skip(1) to get all elements except the first from each group and then call Remove() on those selected elements :

xdoc.Descendants("Field")
    .GroupBy(g => (string)g.Attribute("id"))
    .SelectMany(g => g.Skip(1))
    .Remove();

Upvotes: 2

Related Questions