jwillis0720
jwillis0720

Reputation: 4477

MongoDB Aggregation Pipeline Multiple Groups Complicating Pipeline

Here is my Document structure:

{
 "_id" : ObjectId("50dcd7ff4de274a2c4a31df0"),
 "seq_name" : "169:D18M6ACXX:1:1111:17898:82486:GTGACA_10",
 "raw_seq" : "TTGACCTGAGGAGACGGTGACCAGGGTTCCCTGGCCCCAGTAGTCAACGGGAGTTAGACTTCTCGCACAGTAATAAACAGCCGTGTCCTCGGCTCTCAGGCTGTTCATTTGCAGA",
 "seq_aa" : "LQMNSLRAEDTAVYYCARSLTPVDYWGQGTLVTVSSGQ",
 "cdr3_seq" : "GCGAGAAGTCTAACTCCCGTTGACTAC",
 "cdr3_seq_aa" : "ARSLTPVDY",
 "cdr3_seq_len" : 27,
 "cdr3_seq_aa_len" : 9,
 "vg" : "IGHV3-48*03",
 "dg" : "IGHD3-10*02R",
 "jg" : "IGHJ4*02",
 "donor" : 10
}

I really enjoy MongoDB framework but I'm having trouble with this grouping pipeline and since I can't $out to another collection yet. I can do this multi-grouping pipeline.

db.collection.aggregate({$match:{cdr3_seq_aa_len:{$gt:3}},
   {$group:{_id:$cdr3_seq_aa,other_set:{$addToSet:$cdr3_seq_aa_len}}},
   {$group:{_id:$other_set,sum:{$sum:1}}})

Which gives me how many unique$cdr3_seq_aa's there are grouped by length.

{ id:40, sum:1002031,
  id:41, sum:1949402,....

However The first operation I would like to do is group by donor. So I can first know how many unique cdr3_seq_aa strings there are among each donor. Then I would like to group it by length and count how many strings group with the length.

Upvotes: 2

Views: 6369

Answers (1)

mjhm
mjhm

Reputation: 16705

If I understand the question correctly, this is what you're looking for. The key concept is that you can construct compound _id's from multiple fields.

db.collection.aggregate(
[
    {$match: {cdr3_seq_aa_len: {$gt: 3}}},
    {$group: 
         {
              _id: {donor: "$donor", cdr3_seq_aa: "$cdr3_seq_aa"},
              donor_cdr3_seq_aa_count: {$sum: 1},
              cdr3_seq_aa_len: {$first: "$cdr3_seq_aa_len"}
         }
    },
    {$group:
         {
             _id: {donor: "$_id.donor", len: "$cdr3_seq_aa_len"},
             num_strings_with_this_length: {$sum: 1},
             total_doc_count_by_length:
                  {$sum: "$donor_cdr3_seq_aa_count"}
         }
    }
])

Upvotes: 5

Related Questions