Charles Spinosa
Charles Spinosa

Reputation: 103

Structure and query syntax for recursive documents in MongoDB?

I've recently started looking into MongoDB for a project at work. I'm fairly new to JSON and MongoDB's query structure, so I'm hoping one of you can provide some clarification.

I've translated the problem into Excel terminology since it's common and represents my question fairly well.

If I were attempting to model an Excel formula into a MongoDB document, what is the best format to do it in (I'll explain my potential queries lower)? Keep in mind that formulas in Excel can be nested in (nearly) any order, and with any depth, and arguments can come in either string or numerical form. I would like to be able to search across these cells to answer such queries as "Find all cells that use the =AVG() function" or "Find all cells that contain the =SUM() function inside of an =AVG() function (such as =AVG(x,y,z,SUM(a,b,c)))." Being able to answer these formula-structure based queries is more important than being able to answer ones about numbers or strings if answering all isn't possible.

Currently I'm envisioning my documents having roughly the following format:

{
    formula: "AVG",
    arguments: [4,5, {
        formula: "SUM",
        arguments: [6,7,{
            formula: "ABS",
            arguments: [-8,-9]
(closing parenthesis/brackets)
}

Is that a reasonable format for what I'm looking to do? If it is, how would I query for "Find cases with =SUM inside of =AVG"? What about finding the =ABS formula that's nested even deeper? Because of the dynamic nature of formulas, its not really possible to expect a certain order or certain depth.

Upvotes: 7

Views: 5673

Answers (2)

Charles Spinosa
Charles Spinosa

Reputation: 103

Continuing with the Excel analogy:

First thing's first, I decided that querying on second level and beyond children wasn't really necessary (this makes the problem much, much simpler). This structure can't adapt easily to that, so fair warning. This is not a perfect answer to my original question, sorry! It seems like MongoDB just isn't well suited to store my original question very easily.

I've used a structure akin to how MongoDB recommends representing trees. Instead of trying to represent an Excel doc as one MongoDB document, it is now represented by many (one per unique formula used, and one to store strings and values). All MongoDB documents related to the same excel workbook simply have a field workbook and store which one they belong to there.

A very simple example (the custom _id tags aren't necessary, but are easier to read):

{_id: book1_1
 workbook: book1,
 cell_values: [1,2,3],
 cell_strings: ['hello','world']}

{_id:book1_2
 workbook:  book1,
 formula: 'SUM',
 children: ['AVG','ABS']}

{_id:book1_3
 workbook: book1,
 formula: 'AVG',
 children: ['SUM']}

{_id:book1_4
 workbook: book1,
 formula: 'ABS',
 children: ['SUM']}

These 4 MongoDB documents represent one Excel document that has the following formula structures (this is not the only excel sheet that would produce the above MongoDB documents):

=SUM(AVG())
=AVG(SUM())
=ABS(SUM(ABS()))

Along with values 1,2,3 and strings 'hello','world' somewhere inside of it.

Querying for 'Find books with SUM formula inside of AVG formula' is then the following query:

db.collection.find({$and: [{formula: 'AVG'},{children: 'SUM'}]})

returns the _id:book1_3 MongoDB document. You can then strip the workbook off that however you'd like.

Upvotes: 0

Derick
Derick

Reputation: 36794

If you have an arbitrary structure like this, then I suggest you store the trees in a different way. Arbitrary structures are difficult to query and deal with.

The MongoDB documentation has a few suggestions: http://docs.mongodb.org/manual/tutorial/model-tree-structures/

Upvotes: 5

Related Questions