Reputation: 3195
I am trying to do some aggregation on a multi-indexDataFrame
based on a DatetimeIndex
generated from pandas.date_range.
My DatetimeIndex
looks like this:
DatetimeIndex(['2000-05-30', '2000-05-31', '2000-06-01' ... '2001-1-31'])
And my multi-index DateFrame
looks like this:
value
date id
2000-05-31 1 0
2 1
3 1
2000-06-30 2 1
3 0
4 0
2000-07-30 2 1
4 0
1 0
2002-09-30 1 1
3 1
The dates in the DatetimeIndex
may or may not be in the date index.
I need to retrieve all the id
such that the percentage of value==1
is greater than or equal to some decimal threshold e.g. 0.6
for all the rows where the date for that id
is in the DatetimeIndex
.
For example if the threshold is 0.5
, then the output should be [2, 3]
or some DataFrame
containing 2
and 3
.
1
does not meet the requirement because 2002-09-30
is not in the DatetimeIndex
.
I have a solution with loops and dictonaries to keep track of how often value==1
for each id, but it runs very slowly.
How can I utilize pandas
to perform this aggregation?
Thank you.
Upvotes: 1
Views: 2781
Reputation: 862921
You can use:
#define range
rng = pd.date_range('2000-05-30', '2000-7-01')
#filtering with isin
df = df[df.index.get_level_values('date').isin(rng)]
#get all treshes
s = df.groupby('id')['value'].mean()
print (s)
id
1 0.0
2 1.0
3 0.5
4 0.0
Name: value, dtype: float64
#get all values of index by tresh
a = s.index[s >= 0.5].tolist()
print (a)
[2, 3]
Upvotes: 2