analystic
analystic

Reputation: 351

Elasticsearch - Recursive nested JSON object

I'm trying to parse an HTML document into nested set of tags and content. It needs to support arbitrary nesting depth. The object (created in Python code) looks like:

{
  "content": [
    "some text about a thing, ",
  {"content": "More text with additional set of tags ",
  "tags": ["strong"]
  }
  ], 
  "tags": ["p"]
}

ES seems to dislike this structure, because the content field is of both a text and object type, producing this error; "reason": "mapper [content] of different type, current_type [text], merged_type [ObjectMapper]"

Does anyone have any ideas on how to index this type of object, and also allow for searches on both tags and content? Ideally I'd like to search by tags associated with the ancestors of a given object too. I can reformat it to

{
  "content": [
  {"content": "some text about a thing, "},
  {"content": "More text with a different set of tags ",
  "tags": ["strong"]
  }
  ], 
  "tags": ["p"]
}

But then searching isn't very effective as I need to write content.content:"search string" to get results, which will become hard with multiple levels of nesting.

Upvotes: 0

Views: 1837

Answers (1)

ibexit
ibexit

Reputation: 3667

Why not store the ancestor tags in a separate field? Implementing a nested set will should solve your problem too.

Edit: As requested here comes a example of a nested set

Imagine a tree structure. Every node in this tree has a set of properties like description, or other attributes. Each node holds also a reference to it's parent node. Beside this there are two numbers: left and right position in the tree when traversing with in-depth search:

A(parent:null, left:1, right:12, desc:“root node“)
B(parent:A, left:2, right:3, desc:“left child“)
C(parent:A, left:4, right:11, desc:“right child“)
D(parent:C, left:5, right:6, desc:“foo“)
E(parent:C, left:7, right:10, desc:“bar“)
F(parent:E, left:8, right:9, desc:“baz“)

Calculating all ancenstors of a node is now easy:

ancestors(F for X) = search nodes as N WHERE N.left < X.left AND N.right > X.right

For the node F you'll get [E,C,A]. Ordering them by the left value you'll get the proper order for the ancestors of F.

So now you can use this criteria for the filter query in ES and use a second query for the search in the attributes of filtered nodes.

This structure is very efficient when looking for subtrees, but has downsides when you change the node order/position.

If you need further explanation, please add a comment.

Upvotes: 1

Related Questions