skipper
skipper

Reputation: 171

C# ElasticSearch get external objects by nested list element property comparison

Let's suppose we have the following 2 objects (Please ignore the strange situation, as it is used only to model the real issue):

public class Person
{
    public string FullName { get; set; }
    public decimal TotalSalary { get; set; }
    public List<Position> Positions { get; set; } = new List<Position>();
}

public class Position
{
    public string Name { get; set; }
    public bool IsCurrentPosition { get; set; }
    public DateTime ExpirationDate { get; set; }
    public int? MonthsWorked { get; set; }
    public bool HasAverageLimitStatistic { get; set; } = false;
    public int AverageMonthLimit { get; set; } = 0;
    public decimal Salary { get; set; }
    public string CompanyName { get; set; }
}

How do I write a query in NEST (C#) to find all people who work (or worked) on positions which are coming close to AverageMonthLimit in a timespan of a year, (positions without AverageMonthLimit are ignored); in case MonthsWorked is provided than compare it to AverageMonthLimit, otherwise take today's date and compare it to ExpirationDate.

PS: If the position has the statistic, then minimal AverageMonthLimit is of 36 months. People might work on several jobs.

NOTE: MSSQL script would probably look like:

SELECT * 
FROM Person p
JOIN Position pos
    ON pos.PersonId = p.Id
WHERE
    pos.HasAverageLimitStatistic = 1
        AND pos.AverageMonthLimit > pos.MonthsWorked
        AND pos.AverageMonthLimit - 12 < pos.MonthsWorked
    OR pos.HasAverageLimitStatistic = 0
        AND DATEDIFF(MONTH, GETDATE(), pos.ExpirationDate) > 0
        AND DATEDIFF(MONTH, GETDATE(), pos.ExpirationDate) < 12

UPDATE

Correct me if I'm wrong, but I presume the question boils down to the following, where query is unknown:

var response = _elasticClient.Search<Person>(s => s.Query(q =>
    .DateRange(x => x
    .Field(p => p.Employees
        .First(pos => pos.HasAverageLimitStatistic == null)
        .ExpirationDate)
    .GreaterThan(DateTime.Now)
    .LessThan(DateTime.Now.AddYear(1))) 
    // upper code finds customers who don't have statistic
|| q.Nested(x => x
    .Path(p => p.Positions)
    .Query(qq => qq.Script.Inline(" ?????? "))));

Upvotes: 1

Views: 1135

Answers (1)

Russ Cam
Russ Cam

Reputation: 125528

When querying against nested types, you need to use nested queries, which will take care of querying against the internal documents that nested types are modelled with.

Here's what your SQL query would look like in Elasticsearch

var searchResponse = client.Search<Person>(s => s
    .Query(q => q
        .Nested(n => n
            .Path(p => p.Positions)
            .Query(nq => (+nq
                .Term(f => f.Positions.First().HasAverageLimitStatistic, true) && +nq
                .Script(sq => sq
                    .Inline("doc['positions.monthsWorked'].value != 0 && doc['positions.averageMonthLimit'].value > doc['positions.monthsWorked'].value")
                ) && +nq
                .Script(sq => sq
                    .Inline("doc['positions.monthsWorked'].value != 0 && (doc['positions.averageMonthLimit'].value - 12) < doc['positions.monthsWorked'].value")
                )) || (+nq 
                .Term(f => f.Positions.First().HasAverageLimitStatistic, false) && +nq
                .DateRange(d => d
                    .Field(f => f.Positions.First().ExpirationDate)
                    .GreaterThan(DateTime.Now.Date)
                    .LessThan(DateTime.Now.Date.AddYears(1))
                ))
            )
        )
    )
);

There's a fair bit going here, so I'll break it down

  1. Inside the top level query, a nested query will be performed. Since all predicates (i.e. WHERE clauses) are based on properties of nested types, these queries will all be inside of the nested query.
  2. Specify the Path to the nested type on the top level type
  3. Inside the nested query's query, 3 queries are &&ed together within parentheses for the first set of predicates where HasAverageLimitStatistic is true and 3 queries are &&ed together in parentheses for the second set of predicates where HasAverageLimitStatistic is false.
  4. Since all queries test whether some condition is true or not, we can forgo calculating a score for them by making them filter clauses in a bool query. NEST has handy shorthand for bool query filter clauses in the form of the unary + operator. Similarly, there are overloaded operators combining queries with && and ||
  5. Because you want to compare two properties on an indexed document, you need to use a script query to do this. In Elasticsearch 5.x, the default language for this is Painless.

The query DSL JSON for the above query is

{
  "query": {
    "nested": {
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "filter": [
                  {
                    "term": {
                      "positions.hasAverageLimitStatistic": {
                        "value": true
                      }
                    }
                  },
                  {
                    "script": {
                      "script": {
                        "inline": "doc['positions.monthsWorked'].value != 0 && doc['positions.averageMonthLimit'].value > doc['positions.monthsWorked'].value"
                      }
                    }
                  },
                  {
                    "script": {
                      "script": {
                        "inline": "doc['positions.monthsWorked'].value != 0 && (doc['positions.averageMonthLimit'].value - 12) < doc['positions.monthsWorked'].value"
                      }
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "filter": [
                  {
                    "term": {
                      "positions.hasAverageLimitStatistic": {
                        "value": false
                      }
                    }
                  },
                  {
                    "range": {
                      "positions.expirationDate": {
                        "gt": "2017-06-03T00:00:00-07:00",
                        "lt": "2018-06-03T00:00:00-07:00"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      },
      "path": "positions"
    }
  }
}

Finally, here's a complete example to demonstrate

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));

    var connectionSettings = new ConnectionSettings(pool)
        .InferMappingFor<Person>(m => m
            .IndexName("people")
        )
        .PrettyJson()
        .DisableDirectStreaming()
        .OnRequestCompleted(response =>
            {
                // log out the request
                if (response.RequestBodyInBytes != null)
                {
                    Console.WriteLine(
                        $"{response.HttpMethod} {response.Uri} \n" +
                        $"{Encoding.UTF8.GetString(response.RequestBodyInBytes)}");
                }
                else
                {
                    Console.WriteLine($"{response.HttpMethod} {response.Uri}");
                }

                Console.WriteLine();

                // log out the response
                if (response.ResponseBodyInBytes != null)
                {
                    Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                             $"{Encoding.UTF8.GetString(response.ResponseBodyInBytes)}\n" +
                             $"{new string('-', 30)}\n");
                }
                else
                {
                    Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                             $"{new string('-', 30)}\n");
                }
            });

    var client = new ElasticClient(connectionSettings);

    if (client.IndexExists("people").Exists)
    {
        client.DeleteIndex("people");
    }

    client.CreateIndex("people", c => c
        .Mappings(m => m
            .Map<Person>(mm => mm
                .AutoMap()
                .Properties(p => p
                    .Nested<Position>(n => n
                        .Name(nn => nn.Positions)
                        .AutoMap()
                    )
                )
            )
        )
    );

    client.IndexMany(new[] {
        new Person
        {
            FullName = "Person 1",
            TotalSalary = 100000,
            Positions = new List<Position>
            {
                new Position
                {
                    Name = "Janitor",
                    AverageMonthLimit = 5,
                    MonthsWorked = 3,
                    HasAverageLimitStatistic = true
                }
            }
        },
        new Person
        {
            FullName = "Person 2",
            TotalSalary = 150000,
            Positions = new List<Position>
            {
                new Position
                {
                    Name = "Coach",
                    AverageMonthLimit = 5,
                    HasAverageLimitStatistic = true
                }
            }
        },
        new Person
        {
            FullName = "Person 3",
            TotalSalary = 200000,
            Positions = new List<Position>
            {
                new Position
                {
                    Name = "Teacher",
                    HasAverageLimitStatistic = false,
                    ExpirationDate = DateTime.Now.AddMonths(6)
                }
            }
        },              
        new Person
        {
            FullName = "Person 4",
            TotalSalary = 250000,
            Positions = new List<Position>
            {
                new Position
                {
                    Name = "Head",
                    HasAverageLimitStatistic = false,
                    ExpirationDate = DateTime.Now.AddYears(2)
                }
            }
        }
    });


    client.Refresh(IndexName.From<Person>());

    var searchResponse = client.Search<Person>(s => s
        .Query(q => q
            .Nested(n => n
                .Path(p => p.Positions)
                .Query(nq => (+nq
                    .Term(f => f.Positions.First().HasAverageLimitStatistic, true) && +nq
                    .Script(sq => sq
                        .Inline("doc['positions.monthsWorked'].value != 0 && doc['positions.averageMonthLimit'].value > doc['positions.monthsWorked'].value")
                    ) && +nq
                    .Script(sq => sq
                        .Inline("doc['positions.monthsWorked'].value != 0 && (doc['positions.averageMonthLimit'].value - 12) < doc['positions.monthsWorked'].value")
                    )) || (+nq 
                    .Term(f => f.Positions.First().HasAverageLimitStatistic, false) && +nq
                    .DateRange(d => d
                        .Field(f => f.Positions.First().ExpirationDate)
                        .GreaterThan(DateTime.Now.Date)
                        .LessThan(DateTime.Now.Date.AddYears(1))
                    ))
                )
            )
        )
    );

}

public class Person
{
    public string FullName { get; set; }
    public decimal TotalSalary { get; set; }
    public List<Position> Positions { get; set; } = new List<Position>();
}

public class Position
{
    public string Name { get; set; }
    public bool IsCurrentPosition { get; set; }
    public DateTime ExpirationDate { get; set; }
    public int? MonthsWorked { get; set; }
    public bool HasAverageLimitStatistic { get; set; } = false;
    public int AverageMonthLimit { get; set; } = 0;
    public decimal Salary { get; set; }
    public string CompanyName { get; set; }
}

Based on the query criteria, only Person 1 and Person 3 are returned.

Upvotes: 1

Related Questions