Reputation: 388
I found plenty of stuff on deleting duplicates, but I couldn't find any guidance for the problem below anywhere...
I'm looking for a way to rename any duplicate rows in a C# DataTable that I've imported to from a CSV. My data looks something like:
**Name** **Item** **Quantity**
ABC Item_Name 6
ABC Item_Name_2 1
DEF Item_Name 3
GHI Item_Name_2 7
ABC Item_Name 6
ABC Item_Name 1
JKL Item_Name_3 4
ABC Item_Name 6
ABC Item_Name 1
JKL Item_Name_3 4
What I would like the data to look like, if a whole row (quantity included) appears more than once, is this:
**Name** **Item** **Quantity**
ABC Item_Name 6
ABC Item_Name_2 1
DEF Item_Name 3
GHI Item_Name_2 7
ABC (2) Item_Name 6
ABC (2) Item_Name_2 1
JKL Item_Name_3 4
ABC (3) Item_Name 6
ABC (3) Item_Name_2 1
JKL (2) Item_Name_3 4
At present, I'm using a double "for loop" to determine which rows further down the table have the same data as the present row and rename them accordingly. Clearly, this has two issues:
It's really slow with tables with a large number of rows, obviously
All future rows whose name column was "ABC" now have a name column "ABC (2)", regardless of whether this is their first second or 98th occurrence (because, to be frank, my code sucks).
Any help anyone can offer greatly appreciated :)
Upvotes: 1
Views: 2512
Reputation: 765
foreach(DataRow row in thisTable.Rows)
{
string name = row.Item[0].ToString();
if(name[name.Length - 3] == '(' && name[name.Length - 1] == ')')
continue;
string item = row.Item[1].ToString();
int quantity = Convert.ToInt32(row.Item[2]);
string expression = "Name = " + name + " and Item = " + item + " and Quantity = " + quantity;
DataRow[] matchingRows = table.Select(expression);
for(int i = 1; i < matchingRows.Length; i++)
matchingRows[i]["Name"] += " (" + i + ")";
}
Basically, I just looped through each row. Then, I query every row that is identical. Then, I loop through those identical rows, and rename them with an appended string of (1), (2), (3), etc. (in the order the query found them). I also skip any rows I've already renamed with that 'if' statement (I assume a renamed row ends with "(" at the 3rd to last char and ")" at its last char). Sorry if I made C#-syntax-related errors in the code. The idea is pretty straightforward, though.
Upvotes: 2
Reputation: 9407
Here's a different solution. In my opinion, more elegant!
DataTable table = new DataTable();
table.Columns.Add("Name", typeof(string));
table.Columns.Add("Title", typeof(string));
table.Columns.Add("Quantity", typeof(int));
// Data for test
table.Rows.Add("ABC", "Item_name", 6);
table.Rows.Add("ABC", "Item_name", 6);
table.Rows.Add("ABC2", "Item_name", 6);
table.Rows.Add("ABC2", "Item_name", 6);
table.Rows.Add("ABC2", "Item_name", 6);
table.Rows.Add("ABC2", "Item_name", 6);
// Query with Linq
var query = from row in table.AsEnumerable()
group row by new {
name = row.Field<String>("Name"),
title = row.Field<String>("Title")
} into GrpNameTitle
select new {
Name = GrpNameTitle.Key.name + " (" + GrpNameTitle.Count() + ")",
Title = GrpNameTitle.Key.title,
Quantity = GrpNameTitle.First().Field<int>("Quantity")
};
foreach (var itm in query)
{
Console.WriteLine("{0}\t{1}", itm.Name, itm.Title);
}
I have a question, when you filter your rows "group by" on your two columns (name and title) would it not make sense to add/calculate the third column contains the quantity ? For example:
abc item_name 2
abc item_name 2
give this result:
abc(2) item_name 4
Best regards,
Mentor Reka.
Upvotes: 0