aterbo
aterbo

Reputation: 474

Returning one of each object based on list of Ids and other property using EF/Linq

I have an SQL table with a list of objects and I am trying to return a list using a number of different criteria.

Here is my object (same as the SQL table):

class Photo{
        public int Id { get; set; }
        public string FileName { get; set; }
        public DateTime Uploaded { get; set; }
        public int? ProjectId { get; set; }
        public int GalleryOrder { get; set; }
}  

In the SQL table, there are lots of photos uploaded over time that may or may not be tagged with ProjectIds. There are numerous photos tagged to each ProjectId. They are uploaded in batches, so that there may be numerous photos with the same DateTime Uploaded and then organized with GalleryOrder.

Given a list of ProjectIds I am trying to return one representative photo for each project based on the following parameters:

I have some code that appears to be working with my test database of ~20 entries, but it pulls excess data and has numerous sorts. I'm not sure how to simplify and optimize it.

Here is my current query:

public async Task<List<Photo>> GetOneImageFilesPerProject(List<int> projectIds)
{
    using var context = _contextFactory.CreateDbContext();

    var results = await context.Photos.Where(x => x.ProjectId != null 
                                      && projectIds.Contains((int)x.ProjectId))
                                      .ToListAsync();
                                             
    results = results.OrderBy(x => x.ProjectId)
                     .ThenByDescending(x => x.Uploaded)
                     .ThenBy(x => x.GalleryOrder)
                     .GroupBy(x => x.ProjectId)
                     .Select(x => x.First())
                     .ToList();

    return results;
}

The program compiles with the OrderBy, GroupBy, and Select calls added after the .Where call, but it seemed to hang at that point and never return the final list. That's why it's split into two processing steps.

My other thought was to get the list from the Db and then go through with a foreach loop to build the final list. Not sure if that's faster than using .GroupBy and .Select. Either way seems a bit inelegant and brute force. If there's a straight SQL query that would be a better solution, I'm open to it!

Is there some method that is more straightforward for returning a list with one unique item based on other criteria within the object list?

For scale, this won't be a massive app, but it will be looking for ~10-20 projectIds at a time and each project might have ~40-50 photos tagged to it. Many cases will be a bit smaller (2-5 photos) but some larger, (100-200 photos). This function will be run by the users very frequently, as in several times a day, and the Photo list will be changing frequently.

Edit: Using .NET 5 and EF Core 5

Upvotes: 1

Views: 7740

Answers (3)

Steve Py
Steve Py

Reputation: 34773

EF Core unfortunately hasn't yet caught up to EF6 which can manage to build a cross apply from a GroupBy expression where you want to extract a specific row from the grouped results. Requests for this support have been logged since EF Core 3.1 or possibly even earlier and AFAIK they still haven't been incorporated. (https://github.com/dotnet/efcore/issues/12088)

The typical work-around was to use the GroupBy expression to get a unique representation of values to join back in on the table. This would require a bit of an assumption given you have a Date and a Sequence Number (GalleryOrder) in that we'd need to assume that the GalleryOrder always starts at 1, and these items are not deleted. Otherwise you can use a query to get close to find all photos for each applicable order and date but ultimately would need to do the final selection for lowest gallery from memory:

List<int> projectIds = new[] { 1, 2 }.ToList(); // Just for testing...

var photos = context.Photos
    .Where(x => x.ProjectId.HasValue && projectIds.Contains(x.ProjectId.Value))
    .GroupBy(x => x.ProjectId)
    .Select(g => new
    {
        ProjectId = g.Key,
        Uploaded = g.Max(x => x.Uploaded)
    }).Join(context.Photos.Where(x => x.GalleryOrder == 1), 
        x => x, 
        x => new { x.ProjectId, x.Uploaded }
        , (_, r) => r).ToList();

If Photos can be deleted where we cannot guarantee that there will be a GalleryOrder of 1, or we want to switch to something like the highest gallery order then the final selection will need to be done in memory. (Edit: Removed 2nd option until I have a chance to test the updated grouping as it wasn't addressing the latest date.)

Edit2: Ok, I had a play with the idea behind the second query. It is possible to get the required data via Linq in one operation, however depending on the amount of data we are talking about and the # of projects being selected it may be more practical to load all photos for the selected projects and do the grouping/min-maxing using Linq2Object where the in-group selection can take place. (Until possibly EF Core 6 re-enables this capability)

            var photos = context.Photos
                .Where(x => x.ProjectId.HasValue && projectIds.Contains(x.ProjectId.Value))
                .GroupBy(x => new { x.ProjectId, x.Uploaded })
                .Select(g => new
                {
                    ProjectId = g.Key.ProjectId,
                    Uploaded = g.Key.Uploaded,
                    GalleryOrder = g.Min(x => x.GalleryOrer)
                }).Join(context.Photos, x => x, x => new { x.ProjectId, x.Uploaded, x.GalleryOrder }
                , (_, r) => r)
                .GroupBy(x => new { x.ProjectId, x.GalleryOrder })
                .Select(g => new
                {
                    ProjectId = g.Key.ProjectId,
                    Uploaded = g.Max(x => x.Uploaded),
                    GalleryOrder = g.Key.GalleryOrder
                }).Join(context.Photos, x => x, x => new { x.ProjectId, x.Uploaded, x.GalleryOrder }
                , (_, r) => r)
                .ToList();

Something to consider if there are a large-ish # of project IDs to retrieve, and/or a large-ish number of photos per project. Or load photos per applicable project, batching project IDs if needed, and getting the appropriate one using a Linq2Object ordered grouping.

Upvotes: 1

Caius Jard
Caius Jard

Reputation: 74605

EFC5 will translate this, so long as you have a navigation property in Projects that contains a list of Photos:

var some = new [] {1,2,3}.ToList();
var v = await context.Projects
  .Where(p => some.Contains(p.ProjectId))
  .Select(p => p.Photos
      .OrderByDescending(ph => ph.Uploaded)
      .ThenBy(ph => ph.Gallery)
      .First()
  ).ToListAsync();

It translates to something like:

SELECT y.*
FROM 
  Projects p
  LEFT JOIN
  ( 
    SELECT *
    FROM
      (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProjectId ORDER BY Uploaded DESC, Gallery) rn 
        FROM Photos
      ) x
    WHERE rn = 1
  ) y
WHERE p.ProjectId IN (1,2,3)

Which should be fairly performant. You can also shift the predicate to inside the Select:

var some = new [] {1,2,3}.ToList();
var v = await context.Projects
  .Select(p => p.Photos 
      .Where(p => some.Contains(p.ProjectId))
      .OrderByDescending(ph => ph.Uploaded)
        .ThenBy(ph => ph.Gallery)
     .First()
  )
  .ToListAsync();

Which will generate the same query but with the IN moved to the inner query:

SELECT y.*
FROM 
  Projects p
  LEFT JOIN
  ( 
    SELECT *
    FROM
      (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProjectId ORDER BY Uploaded DESC, Gallery) rn 
        FROM Photos
        WHERE p.ProjectId IN (1,2,3)
      ) x
    WHERE rn = 1
  ) y

It doesn't really matter; SQLS should execute these the same

So "what does starting from Projects do for us?" you may ask..

..well, it essentially gives you the "per project" criteria: it causes EF to write PARTITION BY ProjectId which is crucial to getting the image per project. If you tried just straight from context.Orders EF wouldn't understand the "per project" part


You can also always ExecuteRaw this instead:

    var ids = new[]{1,2,3};
    var idsStr = string.Join(",", ids);

    //yes, raw is intended here because of the IN, not interpolated, even though it's an interpstring
    context.Photos.FromSqlRaw($@"SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY ProjectId ORDER BY Uploaded DESC, Gallery) rn 
  FROM Photos
  WHERE ProjectId IN ({idsStr})
)
WHERE rn = 1").ToList();

Or if concatting values into the raw makes you nervous (it should, though there's not much scope for SQL injection with an array of ints) you could compose on top of the raw:

    var ids = new[]{1,2,3}.ToList();

    context.Photos.FromSqlRaw($@"SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY ProjectId ORDER BY Uploaded DESC, Gallery) rn 
  FROM Photos
)
WHERE rn = 1").Where(p => ids.Contains(p.ProjectId));

EF will bundle the raw up as a subquery but SQLS should be able to push the projectid predicate down into the innermost query and essentially execute this raw the same as the other one

Upvotes: 0

Svyatoslav Danyliv
Svyatoslav Danyliv

Reputation: 27282

Try this query, it should work with EF Core 5

public async Task<List<Photo>> GetOneImageFilesPerProject(List<int> projectIds)
{
    using var context = _contextFactory.CreateDbContext();

    var photos = context.Photos.Where(x => x.ProjectId != null 
                                      && projectIds.Contains((int)x.ProjectId));

    var query = 
        from dp in photos.Select(x => new { x.ProjectId }).Distinct()
        from p in photos.Where(p => p.ProjectId == dp.ProjectId)
            .OrderByDescending(p => p.Uploaded)
            .ThenBy(p => p.GalleryOrder)
            .Take(1)
        select p;

    var results = await query.ToListAsync();
                                        
    return results;
}

Upvotes: 0

Related Questions