xaroulis gekas
xaroulis gekas

Reputation: 165

Finding average while updating nested documents mongodb

I am wondering if i can find the average and upload it along with my data. My code is this:

for file in sorted_files:
    df = process_file(file)

    for row,item in df.iterrows():
        data_dict = item.to_dict()
        mycol1.update_one(
            {"nsamples": {"$lt": 13}},
            {
                "$push": {"samples": data_dict},
                "$min": {"first": data_dict['timestamp1'],"minid13":data_dict['id13']},
                "$max": {"last": data_dict['timestamp1'],'maxid13':data_dict['id13']},
                "$inc": {"nsamples": 1,"totid13":data_dict['id13']}
            },
            upsert=True
        )

My data look like this:

{'_id': ObjectId('6068da8878fa2e568c42c7f1'),
 'first': datetime.datetime(2018, 1, 24, 14, 5),
 'last': datetime.datetime(2018, 1, 24, 15, 5),
 'maxid13': 12.5,
 'minid13': 7.5,
 'nsamples': 13,
 'samples': [{'c14': 'C',
              'id1': 3758.0,
              'id10': 0.0,
              'id11': 274.0,
              'id12': 0.0,
              'id13': 7.5,
              'id15': 0.0,
              'id16': 73.0,
              'id17': 0.0,
              'id18': 0.342,
              'id19': 6.3,
              'id20': 1206.0,
              'id21': 0.0,
              'id22': 0.87,
              'id23': 0.0,
              'id6': 2.0,
              'id7': -79.09,
              'id8': 35.97,
              'id9': 5.8,
              'timestamp1': datetime.datetime(2018, 1, 24, 14, 5),
              'timestamp2': datetime.datetime(2018, 1, 24, 9, 5)},
             {'c14': 'C',
              'id1': 3758.0,
              'id10': 0.0,
              'id11': 288.0,
              'id12': 0.0,
              'id13': 8.4,
              'id15': 0.0,
              'id16': 71.0,
              'id17': 0.0,
              'id18': 0.342,
              'id19': 6.3,
              'id20': 1207.0,
              'id21': 0.0,
              'id22': 0.69,
              'id23': 0.0,
              'id6': 2.0,
              'id7': -79.09,
              'id8': 35.97,
              'id9': 6.2,
              'timestamp1': datetime.datetime(2018, 1, 24, 14, 10),
              'timestamp2': datetime.datetime(2018, 1, 24, 9, 10)},
               .
               .
               .
               .

I use totid13 for that purpose but i if need to to find the average in many document its not very helpful. I tried something like that:

for file in sorted_files:
    df = process_file(file)
    #df.reset_index(inplace=True)  # Reset Index
    #data_dict = df.to_dict('records')  # Convert to dictionary
    #to row einai o arithmos ths grammhskai to item ti periexei h grammh
    for row,item in df.iterrows():
        data_dict = item.to_dict()
        mycol1.update_one(
            {"nsamples": {"$lt": 13}},
            {
                "$push": {"samples": data_dict},
                "$min": {"first": data_dict['timestamp1'],"minid13":data_dict['id13']},
                "$max": {"last": data_dict['timestamp1'],'maxid13':data_dict['id13']},
                "$avg":{"avg_id13":data_dict['id13']},
                "$inc": {"nsamples": 1,"totid13":data_dict['id13']}
            },
            upsert=True
        )

But the output is:

pymongo.errors.WriteError: Unknown modifier: $avg. Expected a valid update modifier or pipeline-style update specified as an array, full error: {'index': 0, 'code': 9, 'errmsg': 'Unknown modifier: $avg. Expected a valid update modifier or pipeline-style update specified as an array'}

Thanks in advance!

Upvotes: 0

Views: 31

Answers (1)

Belly Buster
Belly Buster

Reputation: 8844

$avg is not an update operator it's only an aggregation operator.

If you need the average, calculate that in pandas; you already have the data in pandas and it's what pandas is good at.

Upvotes: 1

Related Questions