jsirr13
jsirr13

Reputation: 1002

Most efficient way to make duplicates unique in collection

I have a collection. And in this collection, if a duplicate is added, I want to append the text " - N" (Where N is an integer that is not used by a current item in the collection).

For Example, if I have the following list:

and try to add 'item1' again, I want the list to end up like so:

If I try to add 'item1' again, the list will then be:

Pretty straight forward. Below is my simple algorithm, but I'm getting a noticeable loss in performance when dealing with 10,000 items. Obviously that's going to happen somewhat, but are there better approaches to this? Couldn't find any similar question asked, so figure I'd see if anyone has ran into a similar issue.

Item copyItem = new Item();
string tempName = name;
int copyNumber = 1;
while(copyItem != null)
{
    copyItem = MyCollection.FirstOrDefault(blah => blah.Name == tempName);
    if (copyItem == null)
    {
        name = tempName;
        break;
    }
    tempName = name + " - " + copyNumber;
    ++copyNumber;
}

Upvotes: 0

Views: 109

Answers (3)

FloChanz
FloChanz

Reputation: 3429

Okay so you need an iterator per value and not a global one. This code will do the thing.

        // Inputs for Tests purpose
        List<string> values = new List<string> { "item1", "item2", "item1", "item1" };
        // Result data
        List<string> finalResult = new List<string>();

        // 1 - Group by item value
        var tempResult = from i in values
                         group i by i;

        // We loop over all different item name
        foreach (var curItem in tempResult)
        {
            // Thanks to the group by we know how many item with the same name exists
            for (int ite = 0; ite < curItem.Count(); ite++)
            {
                if (ite == 0)
                    finalResult.Add(curItem.Key);
                else
                    finalResult.Add(string.Format("{0} - {1}", curItem.Key, ite));
            }
        }

Thanks to LINQ you can reduce the amount of code, next code will do exactly the same thing and should be also quickier because I use the ToList() method so the LINQ query will not have a deferred execution.

         // Inputs for Tests purpose
        List<string> values = new List<string> { "item1", "item2", "item1", "item1" };
        // Result data
        List<string> finalResult = new List<string>();

        values.GroupBy<string, string>(s1 => s1).ToList().ForEach(curItem =>
        {
            for (int ite = 0; ite < curItem.Count(); ite++)
            {
                finalResult.Add(ite == 0 ? curItem.Key : string.Format("{0} - {1}", curItem.Key, ite));
            }
        });

Upvotes: 0

Erik
Erik

Reputation: 12858

I would use a Dictionary<string, int> to store the number of duplicates for a particular item. So a helper method would look something like this:

Dictionary<string, int> countDictionary = new Dictionary<string, int>(); // case sensitive!

string GetNameForItem(string itemName)
{
  var name = itemName;

  var count = 0;
  countDictionary.TryGetValue(itemName, out count);

  if (count > 0)
    name = string.Format("{0} - {1}", itemName, count);

  countDictionary[itemName] = count + 1;
  return name;
}

Alternatively, you could split up the operation into several methods if you didn't want GetNameForItem to automatically increment on retrieval:

int GetCountForItem(string itemName)
{
  var count = 0;
  countDictionary.TryGetValue(itemName, out count);

  return count;
}

string GetNameForItem(string itemName)
{
  var name = itemName;
  var count = GetCountForItem(itemName);

  if (count > 0)
    name = string.Format("{0} - {1}", itemName, count);

  return name;
}

int IncrementCountForItem(string itemName)
{
  var newCount = GetCountForItem(itemName) + 1;
  countDictionary[itemName] = newCount;

  return newCount;
}

It is important to note that if you are supporting deletion from the collection, you will have to update the count accordingly:

int DecrementCountForItem(string itemName)
{
  var newCount = Math.Max(0, GetCountForItem(itemName) - 1); // Prevent count from going negative!
  countDictionary[itemName] = newCount;

  return newCount;
}

You will also have to keep in mind what happens if you have two items, say "Item A" and "Item A - 1", then you delete "Item A". Should you rename "Item A - 1" to "Item A"?

Upvotes: 2

Paweł Bejger
Paweł Bejger

Reputation: 6366

I would firstly sort the values - thanks to this you only need to make a check with the previous value and not with the whole collection.

So it could look like this:

        List<string> values = new List<string> { "item1", "item1", "item1" };

        values.Sort();

        string previousValue = string.Empty; 
        int number = 1; 
        for(int i = 0 ; i < values.Count; i ++) 
        {
            if (values[i].Equals(previousValue))
            {
                previousValue = values[i]; 
                values[i] = values[i] + "-" + number;
                number++;
            }
            else
            {
                previousValue = values[i]; 
                number = 1; 
            }

        }

Upvotes: 2

Related Questions