Cemre Mengü
Cemre Mengü

Reputation: 18754

Reducing buckets by key and hour with Python

I have a bucket of KPIs (Key Performance Indicator) with values in the following structure:

{
    A : [{x : [(hour, value),(hour, value)], y : [(hour, value)]}],
    B : [{d : [(hour, value),(hour, value)], e : [(hour, value)]}]
}

where A and B are buckets, x, y, d, e are KPIs (keys) with a list of (hour, value) tuples.

For each (bucket, key, hour), I want to find the sum and count such that:

{(Bucket, Key, Hour): (sum, count)}

What is the most concise and efficient way of doing this in python ? Most of the ways I come up for grouping by hour and reducing are really long.

Note that libs such as numpy and pandas are available

Upvotes: 0

Views: 63

Answers (1)

anki
anki

Reputation: 765

Steps to succeed:

a) Flatten your list

b) Create pandas DataFrame

c) Do your tasks

t = {
    'A' : [{'x' : [(3, 1),(5, 2)], 'y': [(4, 1)]}],
    'B' : [{'d' : [(4, 3),(4, 1)], 'e' : [(3, 2)]}]
}

t_flatten = [(a,b,c,d) for a in t.keys() for b,x in t[a][0].items() for c,d in x]
print(t_flatten)
[('A', 'y', 4, 1), ('A', 'x', 3, 1), ('A', 'x', 5, 2), 
 ('B', 'e', 3, 2), ('B', 'd', 4, 3), ('B', 'd', 4, 1)]

import pandas as pd
df = pd.DataFrame(t_flatten)
df.groupby([0,1,2]).sum()  # Grouped by bucket, key, hour
df.groupby([0,1,2]).count()

Upvotes: 2

Related Questions