Tom
Tom

Reputation: 8681

Check for existence of an element in the collection c#

I am trying to prevent adding an item in the list that already exist in C#. The code below loops through datatable rows. As you can see rows is of the type List<CubeReportRow>

datatables rows does contain duplicates. I need check if the rowName in the datatable is already in rows object of type List<CubeReportRow>. Please see the condition that i have set in the foreach loop. When I try to check by rowname its says cannot convert string to type CubeReportRow. If I check if (!rows.Contains(row[0])),there is no compile error but i doesn't work. How do I check for its existence in the rows collection.

Class CubeReportRow

 public class CubeReportRow
    {
        public string RowName { get; set; }
        public string RowParagraph { get; set; }
        public int ReportSection { get; set; }
    }

C# Method

 public virtual IList<CubeReportRow> TransformResults(CubeReport report,DataTable dataTable)
        {
            if (dataTable.Rows.Count == 0 || dataTable.Columns.Count == 0)
                return new List<CubeReportRow>();

            var rows = new List<CubeReportRow>();
            var columns = columnTransformer.GetColumns(dataTable);

            foreach (DataRow row in dataTable.Rows)
            {
                
                var rowName = row[0].ToString();
                if (!rows.Contains(rowName))
                {
                    var values =
                        cubeReportValueFactory.CreateCubeReportValuesForRow(dataTable, row, rowName, columns, report);

                    var reportRow = new CubeReportRow(row[3].ToString(), row[2].ToString(), row[1].ToString(), values);
                    rows.Add(reportRow);
                }
            }

            return rows;
        }

Upvotes: 0

Views: 1232

Answers (3)

TheGeneral
TheGeneral

Reputation: 81493

This is not really an answer as I believe Guru Strons answer is sufficient.

However, there are a bunch of ways to do this which will yield different performance and complexity depending on your data / duplicate ratio (and not limited to the following).

Dictionary

var rows = new Dictionary<string, CubeReportRow>();
foreach (var dataRow in _data)
   if (!rows.ContainsKey(dataRow.RowName))
      rows.Add(dataRow.RowName, dataRow);
return rows.Values.ToList();

HashSet

var hashSet = new HashSet<string>(_data.Length);
return _data.Where(x => hashSet.Add(x.RowName)).ToList();

GroupBy

return _data.GroupBy(x => x.RowName).Select(x => x.First()).ToList();

IEqualityComparer

public class SomeComparer : IEqualityComparer<CubeReportRow> {
   public bool Equals(CubeReportRow x, CubeReportRow y) {
      return x.RowName == y.RowName;
   }
   public int GetHashCode(CubeReportRow obj) {
      return obj.RowName.GetHashCode();
   }
}

...

return _data.Distinct(new SomeComparer()).ToList();

Benchmarks

Config

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.746 (2004/?/20H1)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.102
  [Host]        : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0

Results

Method Mean Error StdDev
Dictionary 205.3 us 4.06 us 5.69 us
HashSet 237.6 us 4.73 us 10.19 us
Distinct 299.4 us 5.24 us 4.90 us
GroupBy 451.3 us 5.28 us 4.68 us

Full Test Code

[SimpleJob(RuntimeMoniker.NetCoreApp50)]
public class Test
{
   private CubeReportRow[] _data;


   public class CubeReportRow
   {
      public string RowName { get; set; }
      public string RowParagraph { get; set; }
      public int ReportSection { get; set; }
   }

   [GlobalSetup]
   public void Setup()
   {
      var r = new Random(32);
      _data = new CubeReportRow[10000];
      for (int i = 0; i < 10000; i++)
         _data[i] = new CubeReportRow() {RowName = r.Next(100).ToString()};

   }

   [Benchmark]
   public List<CubeReportRow> Dictionary()
   {
      var rows = new Dictionary<string, CubeReportRow>();
      foreach (var dataRow in _data)
         if (!rows.ContainsKey(dataRow.RowName))
            rows.Add(dataRow.RowName, dataRow);
      return rows.Values.ToList();
   }

   [Benchmark]
   public List<CubeReportRow> HashSet()
   {
      var hashSet = new HashSet<string>(_data.Length);
      return _data.Where(x => hashSet.Add(x.RowName)).ToList();

   }

   public class SomeComparer : IEqualityComparer<CubeReportRow>
   {
      public bool Equals(CubeReportRow x, CubeReportRow y)
      {
         return x.RowName == y.RowName;
      }

      public int GetHashCode(CubeReportRow obj)
      {
         return obj.RowName.GetHashCode();
      }
   }

   [Benchmark]
   public List<CubeReportRow> Distinct()
   {
      return _data.Distinct(new SomeComparer()).ToList();

   }

   [Benchmark]
   public List<CubeReportRow> GroupBy()
   {
      return _data.GroupBy(x => x.RowName).Select(x => x.First()).ToList();

   }
}

Note : If you are interested in performance, run these benchmarks yourself with realistic data.

Upvotes: 2

Guru Stron
Guru Stron

Reputation: 141990

You can use Dictionary<string, CubeReportRow> for your rows variable and check if key (rowName) exists with ContainsKey:

var rows = new Dictionary<string, CubeReportRow>();
if (!rows.ContainsKey(rowName))
{
    // ...
    rows.Add(rowName, reportRow);
}

// ...

return rows.Values.ToList();

Upvotes: 2

zaitsman
zaitsman

Reputation: 9499

LINQ is perfect for this (in terms of easy-to-read code)

At the top of the file: using System.Linq;

Then: if (!rows.Any(r => r.RowName == rowName)) (replace if (!rows.Contains(rowName)))

Upvotes: 1

Related Questions