Reputation: 359
I have a list of lists and each list has the following items:
site, count, time
sample data: site1, 15, 20
I'm trying to figure out the best way to approach this. I want to add up the count and time for each site.
I thought converting it to a dictionary when I iterate through each list, but I'm not sure what that gives me.
for site, count, time in lists:
#create a dictionary, then what?
The end result, I'd like either a list or dictionary (some kind of data structure I can use) with the count and time for each site added up into a "total" list for each sites.
Ex:
site, total_count, total_time
sample data:
site1, 50, 100 #all data for site1 added up
site2, 40, 300 #all data for site2 added up
Not looking for a coded answer, just the best way to get this done and a point in the right direction.
Upvotes: 2
Views: 80
Reputation: 2745
Here is a hacky approach (inspired by electric engineering): use a Counter whose values are complex numbers; the real part is time, imaginary part is count. ;-)
Upvotes: 0
Reputation: 8740
I think, the following would be a right approach to sovle this problem.
import json # For pretty priting dictionary
# List of lists where each sub list contains site, count, time in order
data_list = [
["mysite1.com", 11, 88],
["mysite1.com", 7, 6],
["google.com", 6, 23],
["mysite2.com", 9, 12],
["google.com", 4, 7],
['mysite1.com', 9, 12],
['mysite2.com', 13, 4]
];
d = {}
for l in data_list:
site, count, time = l # Unpacking
if site in d:
# APPEND/UPDATE VALUES
d[site]["count"].append(count)
d[site]["time"].append(time)
else:
# CREATE NEW KEYS WITH DATA
d[site] = {
"count": [count],
"time": [time]
}
d[site]["total_count"] = sum(d[site]["count"])
d[site]["total_time"] = sum(d[site]["time"])
print(json.dumps(d, indent=4))
# {
# "mysite1.com": {
# "count": [
# 11,
# 7,
# 9
# ],
# "time": [
# 88,
# 6,
# 12
# ],
# "total_count": 27,
# "total_time": 106
# },
# "google.com": {
# "count": [
# 6,
# 4
# ],
# "time": [
# 23,
# 7
# ],
# "total_count": 10,
# "total_time": 30
# },
# "mysite2.com": {
# "count": [
# 9,
# 13
# ],
# "time": [
# 12,
# 4
# ],
# "total_count": 22,
# "total_time": 16
# }
# }
Upvotes: 0
Reputation: 783
You said some kind of data structure, so maybe construct a DataFrame
out of the lists that you have and then use the groupby
followed by sum
, to get what you want.
Example:
import pandas as pd
data = [['site1',15,20],['site1',35,80],['site2',15,20]]
df = pd.DataFrame(data,columns=['site','time','count'])
print(df.groupby('site').sum())
Output
time count
site
site1 50 100
site2 15 20
Alternatively:
data = [['site1',15,20],['site1',35,80],['site2',15,20]]
data_d = {}
for rec in data:
if rec[0] in data_d:
data_d[rec[0]][0] += rec[1]
data_d[rec[0]][1] += rec[2]
else:
data_d[rec[0]] = rec[1:]
Upvotes: 1
Reputation: 960
The question is still a little ambiguous but, for example, you could build a class that uses a dictionary of dictionaries. It can aggregate data in an iterative manner by adding data to it like this:
>>> class SiteAggregator:
... def __init__(self):
... self.sites = {}
... def __call__(self, data):
... site_name, site_counts, site_time = data
... if site_name not in self.sites:
... self.sites[site_name] = {'counts':0, 'time':0}
... self.sites[site_name]['counts'] += site_counts
... self.sites[site_name]['time'] += site_time
...
>>> site_agg = SiteAggregator()
>>> site_agg(['a', 20, 22])
>>> site_agg(['b', 10, 13])
>>> site_agg.sites['a']
{'counts': 20, 'time': 22}
>>> site_agg(['a', 10, 12])
>>> site_agg.sites['a']
{'counts': 30, 'time': 34}
>>> sites = [['a', 20, 10], ['b', 30, 15], ['c', 18, 22], ['a', 15, 22], ['b', 10, 2]]
>>> for site in sites:
... site_agg(site)
...
>>> site_agg.sites['a']
{'counts': 65, 'time': 66}
Upvotes: 0
Reputation: 106455
You can iterate over the list of lists (better make it a list of tuples instead) and add the count and time to the total count and total time in the output dict with site as the key:
lists = [
('site1', 15, 20),
('site2', 10, 30),
('site1', 5, 25),
('site1', 30, 55),
('site2', 30, 270)
]
result = {}
for site, count, time in lists:
total_count, total_time = result.get(site, (0, 0))
result[site] = (total_count + count, total_time + time)
result
becomes:
{'site1': (50, 100), 'site2': (40, 300)}
Upvotes: 0