Reputation: 375
While doing some coding exercise, I came across this problem:
"Write a function that takes in a list of dictionaries with a key and list of integers and returns a dictionary with standard deviation of each list."
e.g
input = [
{
'key': 'list1',
'values': [4,5,2,3,4,5,2,3]
},
{
'key': 'list2',
'values': [1,1,34,12,40,3,9,7],
}
]
Answer: Answer: {'list1': 1.12, 'list2':14.19}
Note, the 'values' is actually the key to the list of the value, a little decepting at first!
My attempt:
def stdv(x):
for i in range(len(x)):
for k,v in x[i].items():
result = {}
print(result)
if k == 'values':
mean = sum(v)/len(v)
variance = sum([(j - mean)**2 for j in v]) / len(v)
stdv = variance**0.5
return stdv # not sure here!!
result = {k, v} # this is where i get stuck
I was able to calculate the standard deviation, but I have no idea how to put the results back into the dictionary as suggested in the answer. Can anyone shed some lights into it? Much appreciated!
Upvotes: 5
Views: 2185
Reputation: 12154
Using statistics.pstdev and dictionary comprehensions.
from statistics import stdev, pstdev
#dont shadow the input builtin!
input_ = [
{
'key': 'list1',
'values': [4,5,2,3,4,5,2,3]
},
{
'key': 'list2',
'values': [1,1,34,12,40,3,9,7],
}
]
result = { di["key"] : pstdev(di["values"]) for di in input_}
print(result)
{'list1': 1.118033988749895, 'list2': 14.185710239533304}
Upvotes: 1
Reputation: 130
I would try another implementation of the std calculation since that one is O(2n), because you first loop to get the mean and then to get the std. It can be done in a single loop as noted here.
I'm not sure about it, but i think numpy's implementation does that. So, you can make a function like this one:
from numpy import std
def stdv(list_of_dicts):
return {d['key'] : std(d['values']) for d in list_of_dicts}
UPDATED:
if you really need to implement the std calculation yourself, you can make another function for that:
def std(arr):
n = len(arr)
if n == 0: return 0
_sum, _sq_sum = 0, 0
for v in arr:
_sum += v
_sq_sum += v ** 2
_sq_mean = (_sum / n) ** 2
return (_sq_sum / n - _sq_mean) ** 0.5
Since you said that this is a coding exercise, i will try to point out where i think your mistake is.
You made a loop in x[i].items() to get the key and the values and then check whether you key is 'values' to perform the std calculation. Since you want to store the result in a dictionary, you also need to have the value in the 'key' field simultaneously. With that loop you are only getting one of those at a time.
Also, not directly related, but if you want to loop over a list to get the values inside, and you dont care about the index, is better to do:
for x_i in x:
for k,v in x_i.items():
Instead of:
for i in range(len(x)):
for k,v in x[i].items():
I would recomend this video.
Upvotes: 3
Reputation: 368
you can add with the update function like this
x = [
{
'key': 'list1',
'values': [4,5,2,3,4,5,2,3]
},
{
'key': 'list2',
'values': [1,1,34,12,40,3,9,7],
}
]
arr=[]
for i in range(len(x)):
for k,v in x[i].items():
result = {}
print(result)
if k == 'values':
mean = sum(v)/len(v)
variance = sum([(j - mean)**2 for j in v]) / len(v)
stdv = variance**0.5
#print( stdv) # not sure here!!
arr.append(stdv)
#result = {k, v} # this is where i get stuck
for i in range(len(arr)):
x[i].update({"varience":arr[i]})
print(x)
Upvotes: 0
Reputation: 1301
Try the following, note that it is not adding the values to the array of dictionaries. Instead, it returns a new dictionary (AS SHOWN IN 'Answer:') where each key is the key
from the array of dictionaries...:
def stdv(x):
ret = {}
for i in range(len(x)):
v = x[i]['values']
mean = sum(v)/len(v)
variance = sum([(j - mean)**2 for j in v]) / len(v)
ret[x[i]['key']] = variance**0.5
return ret
Upvotes: 1