bphi
bphi

Reputation: 3195

Pandas MultiIndex Aggregation

I am trying to do some aggregation on a multi-indexDataFrame based on a DatetimeIndex generated from pandas.date_range.

My DatetimeIndex looks like this:

DatetimeIndex(['2000-05-30', '2000-05-31', '2000-06-01' ... '2001-1-31'])

And my multi-index DateFrame looks like this:

                     value
date          id    
2000-05-31    1        0
              2        1                   
              3        1
2000-06-30    2        1
              3        0
              4        0
2000-07-30    2        1
              4        0
              1        0
2002-09-30    1        1
              3        1

The dates in the DatetimeIndex may or may not be in the date index.

I need to retrieve all the id such that the percentage of value==1 is greater than or equal to some decimal threshold e.g. 0.6 for all the rows where the date for that id is in the DatetimeIndex.

For example if the threshold is 0.5, then the output should be [2, 3] or some DataFrame containing 2 and 3.

1 does not meet the requirement because 2002-09-30 is not in the DatetimeIndex.

I have a solution with loops and dictonaries to keep track of how often value==1 for each id, but it runs very slowly.

How can I utilize pandas to perform this aggregation?

Thank you.

Upvotes: 1

Views: 2781

Answers (1)

jezrael
jezrael

Reputation: 862921

You can use:

#define range
rng = pd.date_range('2000-05-30', '2000-7-01')

#filtering with isin
df = df[df.index.get_level_values('date').isin(rng)]

#get all treshes
s = df.groupby('id')['value'].mean()
print (s)
id
1    0.0
2    1.0
3    0.5
4    0.0
Name: value, dtype: float64

#get all values of index by tresh
a = s.index[s >= 0.5].tolist()
print (a)
[2, 3]

Upvotes: 2

Related Questions