Reputation: 159
I have a list of tuples called routers that looks like this:
('142.104.68.167', 11.111999999999853)
('142.104.68.167', 11.369000000000142)
('142.104.68.167', 11.618999999999915)
('142.104.68.1', 16.60699999999997)
('142.104.68.1', 16.847999999999956)
('142.104.68.1', 17.097000000000207)
('192.168.9.5', 15.727999999999838)
('192.168.9.5', 16.01800000000003)
('192.168.9.5', 16.279999999999973)
I have more entries in the list but for now this should be enough. I want to calculate the mean and standard deviation of the values that have the same "key", for example, calculate the average and s.d of all values who's "key" is 142.104.68.167, then calculate the average and s.d of all values who's "key" is 142.104.68.1 and so on.
I have tried doing it in this manner but it is incorrect
for i in range(len(routers)):
for j in range(len(routers)):
if (routers[i][0] == routers[j][0]):
if ((routers[i][0] not in final_router_list) and (routers[j][0] not in final_router_list)):
final_router_list.append(routers[i][0])
sum = 0
for i in range(len(routers)):
for j in range(len(final_router_list)):
if (routers[i][0] == final_router_list[j]):
sum = sum + routers[i][1]
print(routers[i][0],"rout:",final_router_list[j],"time:",routers[i][1],"sum:",sum)
This is the output that I get:
142.104.68.167 rout: 142.104.68.167 time: 11.111999999999853 sum: 11.111999999999853
142.104.68.167 rout: 142.104.68.167 time: 11.369000000000142 sum: 22.480999999999995
142.104.68.167 rout: 142.104.68.167 time: 11.618999999999915 sum: 34.09999999999991
142.104.68.1 rout: 142.104.68.1 time: 16.60699999999997 sum: 50.70699999999988
142.104.68.1 rout: 142.104.68.1 time: 16.847999999999956 sum: 67.55499999999984
142.104.68.1 rout: 142.104.68.1 time: 17.097000000000207 sum: 84.65200000000004
192.168.9.5 rout: 192.168.9.5 time: 15.727999999999838 sum: 100.37999999999988
192.168.9.5 rout: 192.168.9.5 time: 16.01800000000003 sum: 116.39799999999991
192.168.9.5 rout: 192.168.9.5 time: 16.279999999999973 sum: 132.67799999999988
What I want it to be is:
142.104.68.167 rout: 142.104.68.167 time: 11.111999999999853 sum: 11.111999999999853
142.104.68.167 rout: 142.104.68.167 time: 11.369000000000142 sum: 22.480999999999995
142.104.68.167 rout: 142.104.68.167 time: 11.618999999999915 sum: 34.09999999999991
142.104.68.1 rout: 142.104.68.1 time: 16.60699999999997 sum: 16.60699999999997
142.104.68.1 rout: 142.104.68.1 time: 16.847999999999956 sum: 33.455
142.104.68.1 rout: 142.104.68.1 time: 17.097000000000207 sum: 50.552
192.168.9.5 rout: 192.168.9.5 time: 15.727999999999838 sum: 15.727999999999838
192.168.9.5 rout: 192.168.9.5 time: 16.01800000000003 sum: 31.746
192.168.9.5 rout: 192.168.9.5 time: 16.279999999999973 sum: 40.026
Upvotes: 1
Views: 282
Reputation: 8960
Your title asks for standard deviation and mean, but your code seems to just be calculating the cumulative sum of the times...
For what is requested in your title, there are several approaches. I'll provide a pure Python solution. First, convert your data into a data structure that is more amenable to what you're attempting to do:
list_of_tups = [('142.104.68.167', 11.111999999999853),
('142.104.68.167', 11.369000000000142),
('142.104.68.167', 11.618999999999915),
('142.104.68.1', 16.60699999999997),
('142.104.68.1', 16.847999999999956),
('142.104.68.1', 17.097000000000207),
('192.168.9.5', 15.727999999999838),
('192.168.9.5', 16.01800000000003),
('192.168.9.5', 16.279999999999973)]
data = {}
for ip, time in list_of_tups:
data[ip] = data.get(ip, []) + [time]
This gives a dictionary where each IP address is a key, and the times are stored in a list
. From here, you can perform the mathematical operations you want quite easily:
import statistics as stat
for ip, times in data.items():
print(f"ip: {ip}\n times: {times}\n stdev: {stat.stdev(times)}\n mean: {stat.mean(times)}\n")
Output:
ip: 142.104.68.167
times: [11.111999999999853, 11.369000000000142, 11.618999999999915]
stdev: 0.2535080537839964
mean: 11.366666666666637
ip: 142.104.68.1
times: [16.60699999999997, 16.847999999999956, 17.097000000000207]
stdev: 0.2450108841120974
mean: 16.85066666666671
ip: 192.168.9.5
times: [15.727999999999838, 16.01800000000003, 16.279999999999973]
stdev: 0.27611833212116077
mean: 16.008666666666613
Upvotes: 1
Reputation: 8447
@blorgon already showed you how to compute the stats in his answer.
However, nobody seemed to have thought of using defaultdict
which is perfect for such task. You can group your times by ip this way:
from collections import defaultdict
routers = [
('142.104.68.167', 11.111999999999853),
('142.104.68.167', 11.369000000000142),
('142.104.68.167', 11.618999999999915),
('142.104.68.1', 16.60699999999997),
('142.104.68.1', 16.847999999999956),
('142.104.68.1', 17.097000000000207),
('192.168.9.5', 15.727999999999838),
('192.168.9.5', 16.01800000000003),
('192.168.9.5', 16.279999999999973)
]
data = defaultdict(list)
for ip, time in routers:
data[ip].append(time)
And to compute the stats, you can also use data.items()
to iterate.
Upvotes: 0
Reputation: 641
I believe this is what you're asking for. I added comments as an explanation.
routers = [('142.104.68.167', 11.111999999999853),
('142.104.68.167', 11.369000000000142),
('142.104.68.167', 11.618999999999915),
('142.104.68.1', 16.60699999999997),
('142.104.68.1', 16.847999999999956),
('142.104.68.1', 17.097000000000207),
('192.168.9.5', 15.727999999999838),
('192.168.9.5', 16.01800000000003),
('192.168.9.5', 16.279999999999973)]
# a dictionary for your routers
route_dict = {}
# loop through routers
for each in routers:
# if the ip is already in the dictionary, add onto the existing value
if each[0] in route_dict:
route_dict[each[0]] += each[1]
# if ip not already in the dictionary, add a new item into it
else:
route_dict[each[0]] = each[1]
# print the keys(rout) and values(sum) in the dictionary
for key, value in route_dict.items():
print('rout: %s sum: %r' % (key, value))
Output:
rout: 142.104.68.167 sum: 34.09999999999991
rout: 142.104.68.1 sum: 50.552000000000135
rout: 192.168.9.5 sum: 48.02599999999984
Upvotes: 0
Reputation: 2915
If you want the specified output, you can do this by looping through and adding it to a dictionary. This also uses f-strings from python 3.6:
address_and_times = [
('142.104.68.167', 11.111999999999853),
('142.104.68.167', 11.369000000000142),
('142.104.68.167', 11.618999999999915),
('142.104.68.1', 16.60699999999997),
('142.104.68.1', 16.847999999999956),
('142.104.68.1', 17.097000000000207),
('192.168.9.5', 15.727999999999838),
('192.168.9.5', 16.01800000000003),
('192.168.9.5', 16.279999999999973)
]
tracking_dict = {}
for address in address_and_times:
if address[0] not in tracking_dict:
tracking_dict[address[0]] = [address[1]]
else:
tracking_dict[address[0]].append(address[1])
print(f"{address[0]} rout: {address[0]} time: {each[1]} sum: {sum(tracking_dict[address[0]])}")
Upvotes: 0