Sami Wood
Sami Wood

Reputation: 535

Using arango search to find JSON keys in a collection and clean data

I am using arangoDB for a project where there is variable structure to JSON objects. I was thinking I could create a view that would search the collection and extract pertinent keys. It seems that all of the documentation assumes you know where a field of interest will be but I do not. For example, I would like to create a view with the 'foo' field but I do not know where in the many nested objects it may be. My JSON file/collection could be as follows:

[{'foo':'bar'},
 {'nest':
       {'nesty':
           {'foo':bar}
       }
 },
 {arr:[{'blah':'blah'},{'foo':'blah'}]}, ect ]

I understand that ADB has a text search engine but it seems like it is only useful if the data is clean. Ideally, I would like a way to actually search the document itself to make the view. There is far too much data and variation for me to hardcode every path, but the key string "Foo" is always consistent. Ideally, I would like the view to contain the docs as:

[{'foo':bar},{'foo':bar},{'foo':blah},etc]

SO my question is 1, is there a way to perform such data cleansing in arango, 2 if not what are the dominant paradigms to handling such data? Should I be attempting to clean this outside of arango?

Upvotes: 0

Views: 632

Answers (1)

CodeManX
CodeManX

Reputation: 11915

Views allow you to index all fields by setting "includeAllFields": true at the top-level of a link (the collection key). However, you need to specify an attribute path at query time, e.g. SEARCH doc.nest.nesty.foo == "bar". There is no support for searching arbitrary paths like SEARCH doc.* == "bar".

There is currently also no way to automatically flatten your data. If your data has a fixed nesting structure, then you could run a one-off AQL query to do that, but if I understand you correctly, then the data isn't like that. You could still do it on the client-side or perhaps write a user-defined AQL function to do it server-side.

A flattened document like { "data": [{"foo":"bar"},{"foo":"bar"},{"foo":"blah"}] can be indexed like a single attribute with a View. SEARCH doc.data.foo == "..." will search all three fields and match the document if any of them matches the condition.

Upvotes: 0

Related Questions