Reputation: 317
I have a list of dictionaries generated as such:
all_series = []
# loop
...
all_series.append({"name": a.name, "sector": a.sector, "ts":a.ts})
...
name and sector are strings, ts is a pandas time series indexed by date.
Summing all time series together irrespective of name/sector is easy:
reduce(lambda x, y: x.add(y, fill_value=0), [a["ts"] for a in all_series])
Now I want to do this summing, but grouped by sector - i.e. I'd like to get one summed time series by sector and stored in some sensible way. I'm able to easily do this for one hard-coded sector of choice, but can you think of a good way of doing this in a more flexible way?
I guess ideally I get one data frame back, with one column per summed sector?
Upvotes: 0
Views: 136
Reputation: 317
The suggested answer didn't work after all, because it didn't account for different lengths and start/end dates of the individual time series.
This is how I solved it in the end:
pd.concat({(a.sector, a.name): a.ts for i, a in all_series.iterrows()}, axis=1).groupby(axis=1, level=0).sum()
Thanks for the inspiration!
Upvotes: 0
Reputation: 59274
You can use default pd.DataFrame
constructor, agg
these series after groupby
and then sum
pd.DataFrame(all_series).groupby('sector').ts.agg(tuple).transform(lambda k: sum(k)).T
sector sector1 sector2
2018-07-27 10:30:34.057044 102 40
2018-07-28 10:30:34.057044 91 36
2018-07-29 10:30:34.057044 153 163
2018-07-30 10:30:34.057044 119 114
Using sample data (if others want try other solutions):
all_series = []
now = datetime.datetime.now()
all_series.append({"name": 'name1',
"sector": 'sector1',
"ts": pd.Series(np.random.randint(100, size=4),
index=pd.date_range(start=now, freq='D',periods=4))})
all_series.append({"name": 'name1',
"sector": 'sector1',
"ts": pd.Series(np.random.randint(100, size=4),
index=pd.date_range(start=now, freq='D',periods=4))})
all_series.append({"name": 'name2',
"sector": 'sector2',
"ts": pd.Series(np.random.randint(100, size=4),
index=pd.date_range(start=now, freq='D',periods=4))})
all_series.append({"name": 'name2',
"sector": 'sector2',
"ts": pd.Series(np.random.randint(100, size=4),
index=pd.date_range(start=now, freq='D',periods=4))})
Upvotes: 1