remi bourgarel
remi bourgarel

Reputation: 9389

How to use GroupBy on an index in RavenDB?

I have this document, a post :

{Content:"blabla",Tags:["test","toto"], CreatedOn:"2019-05-01 01:02:01"}

I want to have a page that displays themost used tags since the last 30 days.

So far I tried to create an index like this

 public class Toss_TagPerDay : AbstractIndexCreationTask<TossEntity, TagByDayIndex>
{
    public Toss_TagPerDay()
    {

        Map = tosses => from toss in tosses
                        from tag in toss.Tags
                        select new TagByDayIndex()
                        {
                            Tag = tag,
                            CreatedOn = toss.CreatedOn.Date,
                            Count = 1
                        };
        Reduce = results => from result in results
                            group result by new { result.Tag, result.CreatedOn }
                            into g
                            select new TagByDayIndex()
                            {
                                Tag = g.Key.Tag,
                                CreatedOn = g.Key.CreatedOn,
                                Count = g.Sum(i => i.Count)
                            };
    }
}

And I query it like that

 await _session
            .Query<TagByDayIndex, Toss_TagPerDay>()
            .Where(i => i.CreatedOn >= firstDay)
            .GroupBy(i => i.Tag)
            .OrderByDescending(g => g.Sum(i => i.Count))
            .Take(50)
            .Select(t => new BestTagsResult()
            {
                CountLastMonth = t.Count(),
                Tag = t.Key
            })
            .ToListAsync()

But this gives me the error

Message: System.NotSupportedException : Could not understand expression: from index 'Toss/TagPerDay'.Where(i => (Convert(i.CreatedOn, DateTimeOffset) >= value(Toss.Server.Models.Tosses.BestTagsQueryHandler+<>c__DisplayClass3_0).firstDay)).GroupBy(i => i.Tag).OrderByDescending(g => g.Sum(i => i.Count)).Take(50).Select(t => new BestTagsResult() {CountLastMonth = t.Count(), Tag = t.Key}) ---- System.NotSupportedException : GroupBy method is only supported in dynamic map-reduce queries

Any idea how can I make this work ? I could query for all the index data from the past 30 days and do the groupby / order / take in memory but this could make my app load a lot of data.

Upvotes: 2

Views: 1073

Answers (4)

arek.palinski
arek.palinski

Reputation: 131

@Kuepper

Based on your index definition. You can handle that by the following index:

public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
    public TrendingSongs()
    {
        Map = events => from e in events
                        where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
                        select new Result
                        {
                            TrackId = e.TrackId,
                            Count = 1,
                Timestamp = new DateTime(e.TimestampStart.Year, e.TimestampStart.Month, e.TimestampStart.Day) 
                        };
        Reduce = results => from r in results
                            group r by new {r.TrackId, r.Timestamp}
                            into g
                            select new Result
                            {
                                TrackId = g.Key.TrackId,
                                Count = g.Sum(x => x.Count),
                Timestamp = g.Key.Timestamp
                            };
    }
}

and the query using facets:

from index TrendingSongs where Timestamp between $then and $now select facet(TrackId, sum(Count))

Upvotes: 3

Kuepper
Kuepper

Reputation: 1004

I solved a similar problem, by using AdditionalSources that uses dynamic values. Then I update the index every morning to increase the Earliest Timestamp. await IndexCreation.CreateIndexesAsync(new AbstractIndexCreationTask[] {new TrendingSongs()}, _store);

I still have to try it in production, but my tests so far look like it's a lot faster than the alternatives. It does feel pretty hacky though and I'm surprised RavenDB does not offer a better solution.

public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
    public DateTime Earliest = DateTime.UtcNow.AddDays(-16);
    public TrendingSongs()
    {
        Map = events => from e in events
                        where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
                              && e.TimestampStart > new DateTime(TrendingHelpers.Year, TrendingHelpers.Month, TrendingHelpers.Day)
                        select new Result
                        {
                            TrackId = e.TrackId,
                            Count = 1
                        };
        Reduce = results => from r in results
                            group r by new {r.TrackId}
                            into g
                            select new Result
                            {
                                TrackId = g.Key.TrackId,
                                Count = g.Sum(x => x.Count)
                            };
        AdditionalSources = new Dictionary<string, string>
        {
            {
                "TrendingHelpers",
                @"namespace Helpers
{
    public static class TrendingHelpers
    {
        public static int Day = "+Earliest.Day+@";
        public static int Month = "+Earliest.Month+@";
        public static int Year = "+Earliest.Year+@";
    }
}"
            }
        };
    }
}

Upvotes: 0

arek.palinski
arek.palinski

Reputation: 131

The results from the map-reduce index you created will give you the number of tags per day. You want to have the most popular ones from the last 30 days so you need to do the following query:

var tagCountPerDay = session
                .Query<TagByDayIndex, Toss_TagPerDay>()
                .Where(i => i.CreatedOn >= DateTime.Now.AddDays(-30))
                .ToList();

Then you can the the client side grouping by Tag:

var mostUsedTags = tagCountPerDay.GroupBy(x => x.Tag)
                        .Select(t => new BestTagsResult()
                        {
                            CountLastMonth = t.Count(),
                            Tag = t.Key
                        })
                        .OrderByDescending(g => g.CountLastMonth)
                        .ToList();

Upvotes: 2

Danielle
Danielle

Reputation: 3839

The reason for the error is that you can't use 'GroupBy' in a query made on an index.

'GroupBy' can be used when performing a 'dynamic query',

i.e. a query that is made on a collection, without specifying an index.

See:

https://ravendb.net/docs/article-page/4.1/Csharp/client-api/session/querying/how-to-perform-group-by-query

Upvotes: 1

Related Questions