C# Efficient de-duplication of single Datatable column's data

Question

I have a Datatable with some data, example as below, and need to de-duplicate any names in the names field by appending [1], [2] etc.

Current code below, works but is slow on large tables. Any tips on the most efficient way of doing this in C# would be appreciated.

Current Table sample:

- ID	Name		X	Y
- 1	John		45	66
- 2	Paul		44	66
- 3	George		88	102
- 4	John		33	90
- 5	John		53	37
- 6	Paul		97	65
- 7	Ringo		01	87
- 8	Ringo		76	65

Required Table sample:

- ID	Name		X	Y
- 1	John[1]		45	66
- 2	Paul[1]		44	66
- 3	George		88	102
- 4	John[2]		33	90
- 5	John[3]		53	37
- 6	Paul[2]		97	65
- 7	Ringo[1]	01	87
- 8	Ringo[2]	76	65

Current code below:

foreach (DataRow aRow in ds.Tables[0].Rows) // run through all
   {
   string aName = aRow["Name"].ToString();                
   DataRow[] FoundRows = ds.Tables[0].Select("Name = '" + aName +"'"); // Find all rows with same name

    if (FoundRows.Length > 1) // As will always find itself
       {
          int i = 1;
          foreach (DataRow row in FoundRows)
            {
             row["Name"] = row["Name"].ToString() + "[" + i + "]";
              i++;
            }
        ds.Tables[0].AcceptChanges(); // Ensure the rows are updated before looping around.
        }
    }

Nino · Accepted Answer

Here is one approach

DataTable table = new DataTable();
//test data
table.Columns.Add("Name");
table.Columns.Add("X", typeof(int));
table.Rows.Add(new object[] { "john", 10 });
table.Rows.Add(new object[] { "paul", 44 });
table.Rows.Add(new object[] { "ringo", 312 });
table.Rows.Add(new object[] { "george", 30 });
table.Rows.Add(new object[] { "john", 100 });
table.Rows.Add(new object[] { "paul", 443 });

//converting DataTable to enumerable collection of rows and then grouping by name, 
//skipping groups with only one row(such as george or ringo)
var groupedData = table.AsEnumerable().GroupBy(row => row[0].ToString()).Where(g => g.Count() > 1);

//iterate through each group of 
foreach (var group in groupedData)
{
    int counter = 1; //counter for "[x]" suffix
    //iterate through all rows under one name, eg. John
    foreach (var groupedItem in group)
    {
        //add [x] 
        groupedItem[0] = string.Format("{0} [{1}]", group.Key, counter);
        counter++;
    }
}

EDIT: simplified code and made it a bit more efficient, as suggested by AdrianWragg

C# Efficient de-duplication of single Datatable column's data

Answers (2)

Related Questions

C# Efficient de-duplication of single Datatable column&#39;s data

Answers (2)

Related Questions

C# Efficient de-duplication of single Datatable column's data