Reputation: 671
I Have two big lists, list1 with 4 columns and list2 with with 3 columns. If list1 contains the same value in column 1 and column 3 as the list2, then I need to remove that item in list1. I am actually looking for some advantage and efficiently solution. Thank you for any help.
List1:
1, 5, 3, 9 // Remove this
11, 15, 18, 6 // Keep this
List2:
1, 5, 3
List<Tuple<int, int, int, int>> list1 = new List<Tuple<int, int, int, int>>();
List<Tuple<int, int, int>> list2 = new List<Tuple<int, int, int>>();
Upvotes: 1
Views: 2159
Reputation: 56429
Well ideally from a performance perspective you'd be able to leverage HashSet.SymmetricExceptWith
, but you're using two types that are different (and Tuples
at that).
Except
is a possible solution:
list1 = list1.Except(list1
.Where(l1 => list2
.Any(l2 => l2.Item1 == l1.Item1
&& l2.Item2 == l1.Item2
&& l2.Item3 == l1.Item3)))
.ToList();
Upvotes: 3
Reputation: 312
var index2 = list2.ToLookup(t => Tuple.Create(t.Item1, t.Item3));
//var index2 = list2.Select(l => Tuple.Create(l.Item1, l.Item3)).ToList();
//index2.Sort();
var results = from l in list1
where !index2.Contains(Tuple.Create(l.Item1, l.Item3))
select l;
This would probably be fairly efficient. The downside is the additional memory usage by index2. An alternate method of indexing that would be a bit easier on the memory is commented out. The ToList version wouldn't store a reference to your original records, so it would be more lightweight. But the ToLookup index could have more uses to you than this particular problem. ToDictionary would also be an option, instead of ToLookup, if each key is unique, but that's a step backwards into the heavyweights.
Additional gains could possibly be obtained with a couple of well placed AsParallel() calls depending on how big these lists truly are.
var index2 = list2.AsParallel().ToLookup(t => Tuple.Create(t.Item1, t.Item3));
var results = from l in list1.AsParallel()
where !index2.Contains(Tuple.Create(l.Item1, l.Item3))
select l;
Experiment with one or the other or both of those as only your environment could tell us whether that is best. Sometimes the expense of splitting work out over multiple threads can take longer than just serially getting the job done.
Upvotes: 1