LeeCambl
LeeCambl

Reputation: 388

How to count and rename duplicate rows in a DataTable

I found plenty of stuff on deleting duplicates, but I couldn't find any guidance for the problem below anywhere...

I'm looking for a way to rename any duplicate rows in a C# DataTable that I've imported to from a CSV. My data looks something like:

**Name**       **Item**                 **Quantity**
ABC            Item_Name                     6
ABC            Item_Name_2                   1
DEF            Item_Name                     3
GHI            Item_Name_2                   7
ABC            Item_Name                     6
ABC            Item_Name                     1
JKL            Item_Name_3                   4
ABC            Item_Name                     6
ABC            Item_Name                     1
JKL            Item_Name_3                   4

What I would like the data to look like, if a whole row (quantity included) appears more than once, is this:

**Name**          **Item**                 **Quantity**
ABC            Item_Name                        6
ABC            Item_Name_2                      1
DEF            Item_Name                        3
GHI            Item_Name_2                      7
ABC (2)        Item_Name                        6
ABC (2)        Item_Name_2                      1
JKL            Item_Name_3                      4
ABC (3)        Item_Name                        6
ABC (3)        Item_Name_2                      1
JKL (2)        Item_Name_3                      4

At present, I'm using a double "for loop" to determine which rows further down the table have the same data as the present row and rename them accordingly. Clearly, this has two issues:

  1. It's really slow with tables with a large number of rows, obviously

  2. All future rows whose name column was "ABC" now have a name column "ABC (2)", regardless of whether this is their first second or 98th occurrence (because, to be frank, my code sucks).

Any help anyone can offer greatly appreciated :)

Upvotes: 1

Views: 2512

Answers (2)

a_schimpf
a_schimpf

Reputation: 765

    foreach(DataRow row in thisTable.Rows)
    {
        string name = row.Item[0].ToString();

        if(name[name.Length - 3] == '(' && name[name.Length - 1] == ')')
            continue;

        string item = row.Item[1].ToString();
        int quantity = Convert.ToInt32(row.Item[2]);
        string expression = "Name = " + name + " and Item = " + item + " and Quantity = " + quantity;

        DataRow[] matchingRows = table.Select(expression);
        for(int i = 1; i < matchingRows.Length; i++)
            matchingRows[i]["Name"] += " (" + i + ")";  
    }

Basically, I just looped through each row. Then, I query every row that is identical. Then, I loop through those identical rows, and rename them with an appended string of (1), (2), (3), etc. (in the order the query found them). I also skip any rows I've already renamed with that 'if' statement (I assume a renamed row ends with "(" at the 3rd to last char and ")" at its last char). Sorry if I made C#-syntax-related errors in the code. The idea is pretty straightforward, though.

Upvotes: 2

Mentor Reka
Mentor Reka

Reputation: 9407

Here's a different solution. In my opinion, more elegant!

        DataTable table = new DataTable();
        table.Columns.Add("Name", typeof(string));
        table.Columns.Add("Title", typeof(string));
        table.Columns.Add("Quantity", typeof(int));

        // Data for test
        table.Rows.Add("ABC", "Item_name", 6);
        table.Rows.Add("ABC", "Item_name", 6);
        table.Rows.Add("ABC2", "Item_name", 6);
        table.Rows.Add("ABC2", "Item_name", 6);
        table.Rows.Add("ABC2", "Item_name", 6);
        table.Rows.Add("ABC2", "Item_name", 6);

        // Query with Linq
        var query = from row in table.AsEnumerable()
                    group row by new {
                        name  = row.Field<String>("Name"),
                        title = row.Field<String>("Title")
                    } into GrpNameTitle
                    select new {
                        Name  = GrpNameTitle.Key.name + " (" + GrpNameTitle.Count() + ")", 
                        Title = GrpNameTitle.Key.title,
                        Quantity = GrpNameTitle.First().Field<int>("Quantity")
                    };


        foreach (var itm in query)
        {
            Console.WriteLine("{0}\t{1}", itm.Name, itm.Title);
        }

I have a question, when you filter your rows "group by" on your two columns (name and title) would it not make sense to add/calculate the third column contains the quantity ? For example:

abc item_name 2
abc item_name 2

give this result:

abc(2) item_name 4

Best regards,

Mentor Reka.

Upvotes: 0

Related Questions