Reputation: 85
I have a Datatable with some data, example as below, and need to de-duplicate any names in the names field by appending [1], [2] etc.
Current code below, works but is slow on large tables. Any tips on the most efficient way of doing this in C# would be appreciated.
Current Table sample:
- ID Name X Y
- 1 John 45 66
- 2 Paul 44 66
- 3 George 88 102
- 4 John 33 90
- 5 John 53 37
- 6 Paul 97 65
- 7 Ringo 01 87
- 8 Ringo 76 65
Required Table sample:
- ID Name X Y
- 1 John[1] 45 66
- 2 Paul[1] 44 66
- 3 George 88 102
- 4 John[2] 33 90
- 5 John[3] 53 37
- 6 Paul[2] 97 65
- 7 Ringo[1] 01 87
- 8 Ringo[2] 76 65
Current code below:
foreach (DataRow aRow in ds.Tables[0].Rows) // run through all
{
string aName = aRow["Name"].ToString();
DataRow[] FoundRows = ds.Tables[0].Select("Name = '" + aName +"'"); // Find all rows with same name
if (FoundRows.Length > 1) // As will always find itself
{
int i = 1;
foreach (DataRow row in FoundRows)
{
row["Name"] = row["Name"].ToString() + "[" + i + "]";
i++;
}
ds.Tables[0].AcceptChanges(); // Ensure the rows are updated before looping around.
}
}
Upvotes: 0
Views: 65
Reputation: 9650
Probably old good for
loop updating the whole table in one pass will be the fastest approach:
var foundNames = new Dictionary<string, int>();
for (int rowInd = 0; rowInd < dataTable.Rows.Count; rowInd++)
{
// If name is not yet found in foundNames, then store its row
// index. Don't update the dataTable yet -- this is the only
// occurrence so far.
// The index is stored inverted to distinguish from count.
//
// If name is found in foundNames, retrieve the count.
// If count is inverted (non-positive), then we've encountered
// the name second time. In this case update the row with the
// first occurrence and the current row too. Store the count of 2.
//
// If count is positive, then it's third or even later occurrence.
// Update the current row only and store the incremented count.
var name = dataTable.Rows[rowInd]["Name"].ToString();
int count;
if (!foundNames.TryGetValue(name, out count))
foundNames.Add(name, -rowInd);
else
{
if (count <= 0)
{
dataTable.Rows[-count]["Name"] = name + "[1]";
count = 1;
}
count++;
dataTable.Rows[rowInd]["Name"] = name + "[" + count + "]";
foundNames[name] = count;
}
}
Upvotes: 0
Reputation: 7115
Here is one approach
DataTable table = new DataTable();
//test data
table.Columns.Add("Name");
table.Columns.Add("X", typeof(int));
table.Rows.Add(new object[] { "john", 10 });
table.Rows.Add(new object[] { "paul", 44 });
table.Rows.Add(new object[] { "ringo", 312 });
table.Rows.Add(new object[] { "george", 30 });
table.Rows.Add(new object[] { "john", 100 });
table.Rows.Add(new object[] { "paul", 443 });
//converting DataTable to enumerable collection of rows and then grouping by name,
//skipping groups with only one row(such as george or ringo)
var groupedData = table.AsEnumerable().GroupBy(row => row[0].ToString()).Where(g => g.Count() > 1);
//iterate through each group of <string, DataRow>
foreach (var group in groupedData)
{
int counter = 1; //counter for "[x]" suffix
//iterate through all rows under one name, eg. John
foreach (var groupedItem in group)
{
//add [x]
groupedItem[0] = string.Format("{0} [{1}]", group.Key, counter);
counter++;
}
}
EDIT: simplified code and made it a bit more efficient, as suggested by AdrianWragg
Upvotes: 1