motorcb
motorcb

Reputation: 1051

MongoDB group by hour

I save tweets to mongo DB:

 twit.stream('statuses/filter', {'track': ['animal']}, function(stream) {
    stream.on('data', function(data) {
        console.log(util.inspect(data));

        data.created_at = new Date(data.created_at);
        collectionAnimal.insert(data, function(err, docs) {});
    });
});

It's OK.

The tweet time in MongoDB is in format: 2014-04-25 11:45:14 GMT (column created_at) Now I need group column created_at in hours. I would like to have the result:

hour | count tweets in hour


1 | 28

2 | 26

3 | 32

4 | 42

5 | 36

...

My unsuccessful attempt:

    $keys = array('created_at' => true);
    $initial = array('count' => 0);
    $reduce = "function(doc, prev) { prev.count += 1 }";

    $tweetsGroup = $this->collectionAnimal->group( $keys, $initial, $reduce );

But my not able to group by hour.

How to do it?

Upvotes: 17

Views: 23575

Answers (4)

Robert Dziubek
Robert Dziubek

Reputation: 140

Since MongoDB 5.0 (year 2021), you can use $dateTrunc with unit: 'hour' argument:

db.tweets.aggregate([
  {
    $project: {
      hour: { $dateTrunc: { date: "$created_at", unit: "hour" } },
    },
  },
  {
    $group: {
      _id: "$hour",
      count: { $sum: 1 },
    },
  },
])

Upvotes: 1

schoon
schoon

Reputation: 3324

Lalit's answer did not work for me, it kept giving me zeroes. Instead I did:

db.tweets.aggregate(
 { "$project": {
      "y":{"$year":"$created_at"},
      "m":{"$month":"$created_at"},
      "d":{"$dayOfMonth":"$created_at"},
      "h":{"$hour":"$created_at"},
      "tweet":1 }
 },
 { "$group":{ 
       "_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
       'count':{$sum:1} 
   }
 })

the 'count':{$sum:1} is the only difference.

Might help someone new to mongo like me.

Upvotes: 6

Neil Lunn
Neil Lunn

Reputation: 151132

There should be no need to use a $project stage here as the date operator functions can just be employed directly in the $group stage when defining the grouping _id. This saves having to process the entire collection in order to get the result:

Also you are just counting, so simply { "$sum" : 1 }, in which defining a field that didn't exist was the problem resulting in 0.

    $this->collection->aggregate(array(
        array(
            '$group' => array(
                "_id" => array( 
                    "y" => array( '$year' => '$created_at' ),
                    "m" => array( '$month' => '$created_at' ),
                    "d" => array( '$dayOfMonth' => '$created_at' ),
                    "h" => array( '$hour' => '$created_at' ),
                ),
                "total" => array( '$sum' => 1 ),
            ),
        )
    ));

If anything, add a $match stage at the start of the pipeline in order to filter the date. If one day is acceptable for output then you only need to define the $hour in the grouping and you are reducing the working set size, which means faster. And probably what you want to do anyway.

Upvotes: 11

Lalit Agarwal
Lalit Agarwal

Reputation: 2354

I could tell you how you can group using aggregation framework directly on mongo console

db.tweets.aggregate(
 { "$project": {
      "y":{"$year":"$created_at"},
      "m":{"$month":"$created_at"},
      "d":{"$dayOfMonth":"$created_at"},
      "h":{"$hour":"$created_at"},
      "tweet":1 }
 },
 { "$group":{ 
       "_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
       "total":{ "$sum": "$tweet"}
   }
 })

For more options you can look here: http://docs.mongodb.org/manual/reference/operator/aggregation-date/

You will also need to find appropriate way of of using aggregation framework from whichever programming language you are using.

Upvotes: 30

Related Questions