Reputation: 2354
I have two array..
var data1 = new[] {
new { Product = "Product 1", Year = 2009, Sales = 1212 },
new { Product = "Product 2", Year = 2009, Sales = 522 },
new { Product = "Product 1", Year = 2010, Sales = 1337 },
new { Product = "Product 2", Year = 2011, Sales = 711 },
new { Product = "Product 2", Year = 2012, Sales = 2245 },
new { Product = "Product 3", Year = 2012, Sales = 1000 }
};
var data2 = new[] {
new { Product = "Product 1", Year = 2009, Sales = 1212 },
new { Product = "Product 1", Year = 2010, Sales = 1337 },
new { Product = "Product 2", Year = 2011, Sales = 711 },
new { Product = "Product 2", Year = 2012, Sales = 2245 }
};
What I want to do is check for each distinct Product
and Year
in data2
, and if any row exists for any combination of such Product
and Year
in data1
but not in data2
then add that row to data2
.
Example..
In data2
, distinct products are Product1
and Product2
and distinct years are Year1
, Year2
, Year3
and Year4
.
In data1 there exists a row { Product = "Product 2", Year = 2009, Sales = 522 }
, which is not present in data2
, so I wish to add it to data2
.
What I can do is get distinct products and years in two variables.
Then do a for each loop in both and check if combination exists in data1 but not in data2 and if so add it to data2.
What I would like to get is a single LINQ query which can do this job for me rather than doing two distinct separately and then doing a couple of for each loop.
Thanks
Upvotes: 1
Views: 101
Reputation: 26926
I won't make any claims for efficiency, but it is possible in a single query.
If you are content to let Union handle removing duplicates, you can do:
var newd2 = data2.Union(
from d1 in data1
where
(from d2p in data2 from d2y in data2
select new { d2p.Product, d2y.Year })
.Distinct().Any(mp => mp.Product == d1.Product && mp.Year == d1.Year)
select d1);
Alternatively, you can exclude pre-existing data2 matches and use Concat
var newd2 = data2.Concat(
from d1 in data1
where
(from d2p in data2 from d2y in data2 select new { d2p.Product, d2y.Year })
.Distinct().Any(mp => mp.Product == d1.Product && mp.Year == d1.Year) &&
!data2.Any(mp => mp.Product == d1.Product && mp.Year == d1.Year)
select d1
);
OTOH, I couldn't resist some timings. If we call using Union as 1, using Concat varies from 73% of the time, creating HashSets uses 827% of the time and pulling the unique pair set out takes 54%, and skipping the .Distinct() takes 27%, though the dataset is too slow to tell differences in some of these.
Pulling out the pairs and dumping Distinct:
var newdd = (from d2p in data2 from d2y in data2 select new { d2p.Product, d2y.Year });
var newd2 = data2.Concat(
from d1 in data1
where
newdd.Any(mp => mp.Product == d1.Product && mp.Year == d1.Year) &&
!data2.Any(mp => mp.Product == d1.Product && mp.Year == d1.Year)
select d1
);
Upvotes: 1
Reputation: 726919
You can get this to work in a single query. However it is going to be sub-optimal, because for each item in data1
you would need to check three conditions, which potentially require going through the entire data2
for an O(m*n) time complexity (space complexity remains O(1), though).
You can avoid identical loop, though:
var uniqueProd = new HashSet<string>(data2.Select(d=>d.Product));
var uniqueYear = new HashSet<int>(data2.Select(d=>d.Year));
var knownPairs = new HashSet<Tuple<string,int>>(
data2.Select(d=>Tuple.Create(d.Product, d.Year))
);
var newData2 = data2.Concat(
data1.Where(d =>
uniqueProd.Contains(d.Product) // The product is there
&& uniqueYear.Contains(d.Year) // The year is there
&& !knownPairs.Contains(Tuple.Create(d.Product, d.Year)) // Combination is not there
)
).ToArray();
This solution is O(m+n) in time and also O(n) in space.
Upvotes: 2