doozy
doozy

Reputation: 159

Calculating average and s.d from a list of tuples

I have a list of tuples called routers that looks like this:

('142.104.68.167', 11.111999999999853)
('142.104.68.167', 11.369000000000142)
('142.104.68.167', 11.618999999999915)
('142.104.68.1', 16.60699999999997)
('142.104.68.1', 16.847999999999956)
('142.104.68.1', 17.097000000000207)
('192.168.9.5', 15.727999999999838)
('192.168.9.5', 16.01800000000003)
('192.168.9.5', 16.279999999999973)

I have more entries in the list but for now this should be enough. I want to calculate the mean and standard deviation of the values that have the same "key", for example, calculate the average and s.d of all values who's "key" is 142.104.68.167, then calculate the average and s.d of all values who's "key" is 142.104.68.1 and so on.

I have tried doing it in this manner but it is incorrect

for i in range(len(routers)):
        for j in range(len(routers)):
            if (routers[i][0] == routers[j][0]):
                if ((routers[i][0] not in final_router_list) and (routers[j][0] not in final_router_list)):
                    final_router_list.append(routers[i][0])

sum = 0
for i in range(len(routers)):
    for j in range(len(final_router_list)):
        if (routers[i][0] == final_router_list[j]):
            sum = sum + routers[i][1]
            print(routers[i][0],"rout:",final_router_list[j],"time:",routers[i][1],"sum:",sum)

This is the output that I get:

142.104.68.167 rout: 142.104.68.167 time: 11.111999999999853 sum: 11.111999999999853
142.104.68.167 rout: 142.104.68.167 time: 11.369000000000142 sum: 22.480999999999995
142.104.68.167 rout: 142.104.68.167 time: 11.618999999999915 sum: 34.09999999999991
142.104.68.1 rout: 142.104.68.1 time: 16.60699999999997 sum: 50.70699999999988
142.104.68.1 rout: 142.104.68.1 time: 16.847999999999956 sum: 67.55499999999984
142.104.68.1 rout: 142.104.68.1 time: 17.097000000000207 sum: 84.65200000000004
192.168.9.5 rout: 192.168.9.5 time: 15.727999999999838 sum: 100.37999999999988
192.168.9.5 rout: 192.168.9.5 time: 16.01800000000003 sum: 116.39799999999991
192.168.9.5 rout: 192.168.9.5 time: 16.279999999999973 sum: 132.67799999999988

What I want it to be is:

142.104.68.167 rout: 142.104.68.167 time: 11.111999999999853 sum: 11.111999999999853
142.104.68.167 rout: 142.104.68.167 time: 11.369000000000142 sum: 22.480999999999995
142.104.68.167 rout: 142.104.68.167 time: 11.618999999999915 sum: 34.09999999999991
142.104.68.1 rout: 142.104.68.1 time: 16.60699999999997 sum: 16.60699999999997
142.104.68.1 rout: 142.104.68.1 time: 16.847999999999956 sum: 33.455
142.104.68.1 rout: 142.104.68.1 time: 17.097000000000207 sum: 50.552
192.168.9.5 rout: 192.168.9.5 time: 15.727999999999838 sum: 15.727999999999838
192.168.9.5 rout: 192.168.9.5 time: 16.01800000000003 sum: 31.746
192.168.9.5 rout: 192.168.9.5 time: 16.279999999999973 sum: 40.026

Upvotes: 1

Views: 282

Answers (4)

ddejohn
ddejohn

Reputation: 8960

Your title asks for standard deviation and mean, but your code seems to just be calculating the cumulative sum of the times...

For what is requested in your title, there are several approaches. I'll provide a pure Python solution. First, convert your data into a data structure that is more amenable to what you're attempting to do:

list_of_tups = [('142.104.68.167', 11.111999999999853),
                ('142.104.68.167', 11.369000000000142),
                ('142.104.68.167', 11.618999999999915),
                ('142.104.68.1', 16.60699999999997),
                ('142.104.68.1', 16.847999999999956),
                ('142.104.68.1', 17.097000000000207),
                ('192.168.9.5', 15.727999999999838),
                ('192.168.9.5', 16.01800000000003),
                ('192.168.9.5', 16.279999999999973)]


data = {}
for ip, time in list_of_tups:
    data[ip] = data.get(ip, []) + [time]

This gives a dictionary where each IP address is a key, and the times are stored in a list. From here, you can perform the mathematical operations you want quite easily:

import statistics as stat

for ip, times in data.items():
    print(f"ip: {ip}\n  times: {times}\n  stdev: {stat.stdev(times)}\n  mean: {stat.mean(times)}\n")

Output:

ip: 142.104.68.167
  times: [11.111999999999853, 11.369000000000142, 11.618999999999915]
  stdev: 0.2535080537839964
  mean: 11.366666666666637

ip: 142.104.68.1
  times: [16.60699999999997, 16.847999999999956, 17.097000000000207]
  stdev: 0.2450108841120974
  mean: 16.85066666666671

ip: 192.168.9.5
  times: [15.727999999999838, 16.01800000000003, 16.279999999999973]
  stdev: 0.27611833212116077
  mean: 16.008666666666613

Upvotes: 1

DevLounge
DevLounge

Reputation: 8447

@blorgon already showed you how to compute the stats in his answer.

However, nobody seemed to have thought of using defaultdict which is perfect for such task. You can group your times by ip this way:

from collections import defaultdict

routers = [
    ('142.104.68.167', 11.111999999999853),
    ('142.104.68.167', 11.369000000000142),
    ('142.104.68.167', 11.618999999999915),
    ('142.104.68.1', 16.60699999999997),
    ('142.104.68.1', 16.847999999999956),
    ('142.104.68.1', 17.097000000000207),
    ('192.168.9.5', 15.727999999999838),
    ('192.168.9.5', 16.01800000000003),
    ('192.168.9.5', 16.279999999999973)
]


data = defaultdict(list)
for ip, time in routers:
    data[ip].append(time)

And to compute the stats, you can also use data.items() to iterate.

Upvotes: 0

ObjectJosh
ObjectJosh

Reputation: 641

I believe this is what you're asking for. I added comments as an explanation.

routers = [('142.104.68.167', 11.111999999999853),
('142.104.68.167', 11.369000000000142),
('142.104.68.167', 11.618999999999915),
('142.104.68.1', 16.60699999999997),
('142.104.68.1', 16.847999999999956),
('142.104.68.1', 17.097000000000207),
('192.168.9.5', 15.727999999999838),
('192.168.9.5', 16.01800000000003),
('192.168.9.5', 16.279999999999973)]

# a dictionary for your routers
route_dict = {}

# loop through routers
for each in routers:
    # if the ip is already in the dictionary, add onto the existing value
    if each[0] in route_dict:
        route_dict[each[0]] += each[1]
    # if ip not already in the dictionary, add a new item into it
    else:
        route_dict[each[0]] = each[1]

# print the keys(rout) and values(sum) in the dictionary
for key, value in route_dict.items():
    print('rout: %s sum: %r' % (key, value))

Output:

rout: 142.104.68.167 sum: 34.09999999999991
rout: 142.104.68.1 sum: 50.552000000000135
rout: 192.168.9.5 sum: 48.02599999999984

Upvotes: 0

KetZoomer
KetZoomer

Reputation: 2915

If you want the specified output, you can do this by looping through and adding it to a dictionary. This also uses f-strings from python 3.6:

address_and_times = [
    ('142.104.68.167', 11.111999999999853),
    ('142.104.68.167', 11.369000000000142),
    ('142.104.68.167', 11.618999999999915),
    ('142.104.68.1', 16.60699999999997),
    ('142.104.68.1', 16.847999999999956),
    ('142.104.68.1', 17.097000000000207),
    ('192.168.9.5', 15.727999999999838),
    ('192.168.9.5', 16.01800000000003),
    ('192.168.9.5', 16.279999999999973)
]

tracking_dict = {}
for address in address_and_times:
    if address[0] not in tracking_dict:
        tracking_dict[address[0]] = [address[1]]
    else:
        tracking_dict[address[0]].append(address[1])
    print(f"{address[0]} rout: {address[0]} time: {each[1]} sum: {sum(tracking_dict[address[0]])}")

Upvotes: 0

Related Questions