Arnab
Arnab

Reputation: 2354

Adding rows from one IEnumerable to another based on conditions

I have two array..

var data1 = new[] { 
        new { Product = "Product 1", Year = 2009, Sales = 1212 },
        new { Product = "Product 2", Year = 2009, Sales = 522 },
        new { Product = "Product 1", Year = 2010, Sales = 1337 },
        new { Product = "Product 2", Year = 2011, Sales = 711 },
        new { Product = "Product 2", Year = 2012, Sales = 2245 },
        new { Product = "Product 3", Year = 2012, Sales = 1000 }
    };

var data2 = new[] { 
        new { Product = "Product 1", Year = 2009, Sales = 1212 },
        new { Product = "Product 1", Year = 2010, Sales = 1337 },
        new { Product = "Product 2", Year = 2011, Sales = 711 },
        new { Product = "Product 2", Year = 2012, Sales = 2245 }
    };

What I want to do is check for each distinct Product and Year in data2, and if any row exists for any combination of such Product and Year in data1 but not in data2 then add that row to data2.

Example.. In data2, distinct products are Product1 and Product2 and distinct years are Year1, Year2, Year3 and Year4.

In data1 there exists a row { Product = "Product 2", Year = 2009, Sales = 522 }, which is not present in data2, so I wish to add it to data2.

What I can do is get distinct products and years in two variables.

Then do a for each loop in both and check if combination exists in data1 but not in data2 and if so add it to data2.

What I would like to get is a single LINQ query which can do this job for me rather than doing two distinct separately and then doing a couple of for each loop.

Thanks

Upvotes: 1

Views: 101

Answers (2)

NetMage
NetMage

Reputation: 26926

I won't make any claims for efficiency, but it is possible in a single query.

If you are content to let Union handle removing duplicates, you can do:

var newd2 = data2.Union(
    from d1 in data1
    where
        (from d2p in data2 from d2y in data2 
         select new { d2p.Product, d2y.Year })
        .Distinct().Any(mp => mp.Product == d1.Product && mp.Year == d1.Year)
    select d1);

Alternatively, you can exclude pre-existing data2 matches and use Concat

var newd2 = data2.Concat(
    from d1 in data1
    where
        (from d2p in data2 from d2y in data2 select new { d2p.Product, d2y.Year })
        .Distinct().Any(mp => mp.Product == d1.Product && mp.Year == d1.Year) &&
        !data2.Any(mp => mp.Product == d1.Product && mp.Year == d1.Year)
    select d1
    );

OTOH, I couldn't resist some timings. If we call using Union as 1, using Concat varies from 73% of the time, creating HashSets uses 827% of the time and pulling the unique pair set out takes 54%, and skipping the .Distinct() takes 27%, though the dataset is too slow to tell differences in some of these.

Pulling out the pairs and dumping Distinct:

var newdd = (from d2p in data2 from d2y in data2 select new { d2p.Product, d2y.Year });
var newd2 = data2.Concat(
    from d1 in data1
    where
        newdd.Any(mp => mp.Product == d1.Product && mp.Year == d1.Year) &&
        !data2.Any(mp => mp.Product == d1.Product && mp.Year == d1.Year)
    select d1
    );

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726919

You can get this to work in a single query. However it is going to be sub-optimal, because for each item in data1 you would need to check three conditions, which potentially require going through the entire data2 for an O(m*n) time complexity (space complexity remains O(1), though).

You can avoid identical loop, though:

var uniqueProd = new HashSet<string>(data2.Select(d=>d.Product));
var uniqueYear = new HashSet<int>(data2.Select(d=>d.Year));
var knownPairs = new HashSet<Tuple<string,int>>(
    data2.Select(d=>Tuple.Create(d.Product, d.Year))
);
var newData2 = data2.Concat(
    data1.Where(d =>
        uniqueProd.Contains(d.Product)                       // The product is there
    &&  uniqueYear.Contains(d.Year)                          // The year is there
    && !knownPairs.Contains(Tuple.Create(d.Product, d.Year)) // Combination is not there
    )
).ToArray();

This solution is O(m+n) in time and also O(n) in space.

Upvotes: 2

Related Questions