Roxana Tapia
Roxana Tapia

Reputation: 115

MongoDB - Project specific element from array (big data)

I got a big array with data in the following format:

{
  "application": "myapp",
  "buildSystem": {
    "counter": 2361.1,
    "hostname": "host.com",
    "jobName": "job_name",
    "label": "2361",
    "systemType": "sys"
  },
  "creationTime": 1517420374748,
  "id": "123",
  "stack": "OTHER",
  "testStatus": "PASSED",
  "testSuites": [
    {
      "errors": 0,
      "failures": 0,
      "hostname": "some_host",
      "properties": [

        {
          "name": "some_name",
          "value": "UnicodeLittle"
        },
        <MANY MORE PROPERTIES>,
        {
          "name": "sun",
          "value": ""
        }
      ],
      "skipped": 0,
      "systemError": "",
      "systemOut": "",
      "testCases": [
        {
          "classname": "IdTest",
          "name": "has correct representation",
          "status": "PASSED",
          "time": "0.001"
        },
        <MANY MORE TEST CASES>,
        {
          "classname": "IdTest",
          "name": "normalized values",
          "status": "PASSED",
          "time": "0.001"
        }
      ],
      "tests": 8,
      "time": 0.005,
      "timestamp": "2018-01-31T17:35:15",
      "title": "IdTest"
    }
      <MANY MORE TEST SUITES  >,
]}

Where I can distinct three main structures with big data: TestSuites, Properties, and TestCases. My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete. I would like to select only the "time" value from TestSuites, but it kind of conflicts with the "time" of TestCases in my query:

    db.my_tests.find(
        {
        application: application,
        creationTime:{
            $gte: start_date.valueOf(),
            $lte: end_date.valueOf()
        }
    },
    {
        application: 1, 
        creationTime: 1,
        buildSystem: 1,
        "testSuites.time": 1,
        _id:1
    }
   )

Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time: 1 without success. Please notice that TestSuites is an array of one element with a dictionary.

I already checked this similar post without success: Mongodb update the specific element from subarray

Upvotes: 1

Views: 258

Answers (2)

Roxana Tapia
Roxana Tapia

Reputation: 115

Following code prints duration of each TestSuite:

query = db.my_collection.aggregate(
    [
        {$match: {

            application: application,
            creationTime:{
                $gte: start_date.valueOf(),
                $lte: end_date.valueOf()
            }
        }
        },
        { $project :
            { duration: { $sum: "$testSuites.time"}}
        } 
    ] 
).forEach(function(doc) 
    {
        print(doc._id)
        print(doc.duration)
    }
)

Upvotes: 1

lefas
lefas

Reputation: 32

Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time

Answering to your problem of prejecting only the time property of the testSuites document you can simply try projecting it with "testSuites.time" : 1 (you need to add the quotes for the dot notation property references).

My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete
As for your task, i suggest you try out the mongodb's aggregation framework for your calculations documents tranformations. The aggregations framework option {allowDiskUse : true} will also help you if you are proccessing "large" documents.

Upvotes: 0

Related Questions