Elasticsearch - nested types vs collapse/aggs

I have a use case where I need to find the latest data based on some fields.

The fields are:

For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")

The problem is that category.* is a nested field and I can't aggs/collapse these fields because ES doesn't support it.

Mapping:

PUT data
{
  "mappings": {
    "properties": {
      "createdAt": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss.SSSZ"
      },
      "category": {
        "type": "nested",
        "properties": {
          "name": {
            "type":   "text",
            "analyzer": "keyword"
          }
        }
      },
      "approved": {
        "type":   "text",
        "analyzer": "keyword"
      }
    }
  }
}

Data:

POST data/_create/1
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Chris T.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "no"
}
    
POST data/_create/2
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Chris T.",
        "level": "A"
      }
  ],
  "createdBy": "Max",
  "createdAt": "2022-04-10 10:09:27.527+0200",
  "approved": "no"
}

POST data/_create/3
{  
  "category": [
      {
        "name": "Rick J.",
        "level": "B"
      }
  ],
  "createdBy": "Rick",
  "createdAt": "2022-03-02 02:09:27.527+0200",
  "approved": "no"
}

I'm looking for either a search query that can handle that in an acceptable performant way, or a new object design without nested type where I could take advantage of aggs/collapse feature.

Any suggestion will be really appreciated.

Upvotes: 0

Views: 271

Answers (1)

Paulo
Paulo

Reputation: 10431

About your first question,

For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")

I believe you can do something along those lines:

GET /72088168/_search
{
  "query": {
    "nested": {
      "path": "category",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "category.name": "John G."
              }
            },
            {
              "match": {
                "category.level": "A"
              }
            }
          ]
        }
      }
    }
  },
  "sort": [
    {
      "createdAt": {
        "order": "desc"
      }
    }
  ],
  "size":1
}

For the 2nd matter, it really depends on what you are aiming to do. could merge category.name and category.level in the same field. Such that you document would look like:

{  
  "category": ["John G. A","Chris T. A"],
  "createdBy": "Max",
  "createdAt": "2022-04-10 10:09:27.527+0200",
  "approved": "no"
}

No more nested needed. Although I agree it feels like using tape to fix your issue.

Upvotes: 1

Related Questions