hajons
hajons

Reputation: 135

Pandas: apply a function to a multiindexed series

I have a series 'incoming' that looks like this:

number.hash                               local_time         
19ace78686acf5772212d77595cb7efdb52788bf  2011-04-29 12:00:00    1
1a84708ae329e17438e8157165f91f3dec468eb6  2011-04-25 17:00:00    1
1f5b196086ca35e752eb39e4e348ae925d030af9  2011-02-16 14:00:00    1
                                          2011-02-16 15:00:00    0
                                          2011-02-16 16:00:00    0

, where numbers.hash and local_time together is a MultiIndex. Now I want to apply any function to each series indexed by numbers.hash only, e.g. summing the values in each time series that is made up of local_time and the value. I guess I can get the number.hash indices and iterate over them, but there must be a more efficient and clean way to do it.

Upvotes: 0

Views: 86

Answers (1)

Jeff
Jeff

Reputation: 129018

In [36]: s = Series([1,1,1,0,0],pd.MultiIndex.from_tuples([
('A',Timestamp('20110429 12:00:00')),
('B',Timestamp('20110425 17:00:00')),
('C',Timestamp('20110216 14:00:00')),
('C',Timestamp('20110426 15:00:00')),
('C',Timestamp('20110426 16:00:00'))]))


A  2011-04-29 12:00:00    1
B  2011-04-25 17:00:00    1
C  2011-02-16 14:00:00    1
   2011-04-26 15:00:00    0
   2011-04-26 16:00:00    0
dtype: int64

Sum by the level (these are vectorized and very fast)

In [37]: s.sum(level=0)
Out[37]: 
A    1
B    1
C    1
dtype: int64

Or groupby and apply an arbitrary function

In [38]: s.groupby(level=0).apply(lambda x: x.sum())
Out[38]: 
A    1
B    1
C    1
dtype: int64

Upvotes: 3

Related Questions