Oli
Oli

Reputation: 317

Aggregate time series from list of dictionaries (Python)

I have a list of dictionaries generated as such:

all_series = []
    # loop
    ...
    all_series.append({"name": a.name, "sector": a.sector, "ts":a.ts})
    ...

name and sector are strings, ts is a pandas time series indexed by date.

Summing all time series together irrespective of name/sector is easy:

reduce(lambda x, y: x.add(y, fill_value=0), [a["ts"] for a in all_series])

Now I want to do this summing, but grouped by sector - i.e. I'd like to get one summed time series by sector and stored in some sensible way. I'm able to easily do this for one hard-coded sector of choice, but can you think of a good way of doing this in a more flexible way?

I guess ideally I get one data frame back, with one column per summed sector?

Upvotes: 0

Views: 136

Answers (2)

Oli
Oli

Reputation: 317

The suggested answer didn't work after all, because it didn't account for different lengths and start/end dates of the individual time series.

This is how I solved it in the end:

pd.concat({(a.sector, a.name): a.ts for i, a in all_series.iterrows()}, axis=1).groupby(axis=1, level=0).sum()

Thanks for the inspiration!

Upvotes: 0

rafaelc
rafaelc

Reputation: 59274

You can use default pd.DataFrame constructor, agg these series after groupby and then sum

pd.DataFrame(all_series).groupby('sector').ts.agg(tuple).transform(lambda k: sum(k)).T

sector                      sector1 sector2
2018-07-27 10:30:34.057044  102     40
2018-07-28 10:30:34.057044  91      36
2018-07-29 10:30:34.057044  153     163
2018-07-30 10:30:34.057044  119     114

Using sample data (if others want try other solutions):

all_series = []
now = datetime.datetime.now()
all_series.append({"name": 'name1',
                   "sector": 'sector1', 
                   "ts": pd.Series(np.random.randint(100, size=4), 
                                   index=pd.date_range(start=now, freq='D',periods=4))})
all_series.append({"name": 'name1',
                   "sector": 'sector1', 
                   "ts": pd.Series(np.random.randint(100, size=4), 
                                   index=pd.date_range(start=now, freq='D',periods=4))})
all_series.append({"name": 'name2',
                   "sector": 'sector2', 
                   "ts": pd.Series(np.random.randint(100, size=4), 
                                   index=pd.date_range(start=now, freq='D',periods=4))})
all_series.append({"name": 'name2',
                   "sector": 'sector2', 
                   "ts": pd.Series(np.random.randint(100, size=4), 
                                   index=pd.date_range(start=now, freq='D',periods=4))})

Upvotes: 1

Related Questions