Peter S
Peter S

Reputation: 575

Fastest way to get mean value from dictionary

print (data_Week) gives me :

{'2016-04-09 00:56': ['12.0', '50.7'], '2016-04-08 05:23': ['15.4', '49.8'], '2016-04-....}

The values are Temperature and Humidity values.

I'd like to get the average values from the dictionary data_Week.

The method I'm using works but it takes forever on my raspberry pi....

for date,value in data_Week.items():
    temp_first_value_Week = float(value[0])
    temp_total_Week += temp_first_value_Week
    temp_length_Week += 1
    hum_first_value_Week = float(value[1])
    hum_total_Week += hum_first_value_Week
    hum_length_Week += 1
if temp_length_Week > 1:
    tempAverage_Week = temp_total_Week/temp_length_Week
    tempAverage_Week = "%.2f" % tempAverage_Week
tempAverage_Week = str(tempAverage_Week)+'\xb0C'
if hum_length_Week > 1:
    humAverage_Week = hum_total_Week/hum_length_Week
    humAverage_Week = "%.2f" % humAverage_Week
humAverage_Week = str(humAverage_Week)+'%'

there's a dictionary entry every minute and I'm trying to get average values from a week. So there are 1440 Temperature Values and 1440 Humidity Values per day..... 10080 values per week. Is there a smart way to get the average values. The method from above takes the pi around 15 minutes

Edit: I found out, that the script took so long because I looped over the dictionary which was not necessary as BHawk mentioned in his post.

I going to use the One Liner from John Coleman. It works perfect. And thanks for the Pandas approach. Maybe if the current version is going to slow down again I'm going to switch to it. Thanks for the help.

Upvotes: 0

Views: 73

Answers (3)

BHawk
BHawk

Reputation: 2472

You don't need to increment the number of entries each time you read a value. You don't need to cast to float when reading the values, they are already floats.

Try:

week_length = len(data_Week.keys())
if week_length > 1:
    tempAverage_Week = sum([x for x,y in data_Week.values()])/weekLength
    humAverage_Week = sum([y for x,y in data_Week.values()])/weekLength

Upvotes: 1

Lauro Moura
Lauro Moura

Reputation: 750

Have you tried pandas? I think it could perform better for this volume and type of data/operations you're doing. For example, I saved you sample data in a json file and ran the following script:

import pandas as pd

with open("data.json", "r") as handle:
  x = pd.read_json(handle, orient='index')

print("Data:")
print(x)
print("Description:")
print(x.describe()) # Will print a summary of each column

Result

Data:
                        0     1
2016-04-08 05:23:00  15.4  49.8
2016-04-09 00:56:00  12.0  50.7
Description:
               0          1
count   2.000000   2.000000
mean   13.700000  50.250000
std     2.404163   0.636396
min    12.000000  49.800000
25%    12.850000  50.025000
50%    13.700000  50.250000
75%    14.550000  50.475000
max    15.400000  50.700000

Upvotes: 2

John Coleman
John Coleman

Reputation: 52008

Maybe I'm missing something, but if all you want is the average of the temperatures then a 1-line solution should be possible and should run fast:

>>> d = {'2016-04-09 00:56': ['12.0', '50.7'], '2016-04-08 05:23': ['15.4', '49.8']}
>>> sum(float(x) for x,y in d.values())/len(d)
13.7

Upvotes: 2

Related Questions