Reputation: 891
When I use apply
to a user defined function in Pandas, it looks like python is creating an additional array. How could I get rid of it? Here is my code:
def fnc(group):
x = group.C.values
out = x[np.where(x < 0)]
return pd.DataFrame(out)
data = pd.DataFrame({'A':np.random.randint(1, 3, 10),
'B':3,
'C':np.random.normal(0, 1, 10)})
data.groupby(by=['A', 'B']).apply(fnc).reset_index()
There is this weird Level_2
index created. Is there a way to avoid creating it when running my function?
A B level_2 0
0 1 3 0 -1.054134802
1 1 3 1 -0.691996447
2 2 3 0 -1.068693768
3 2 3 1 -0.080342046
4 2 3 2 -0.181869799
Upvotes: 5
Views: 1783
Reputation: 32105
As such, you will have no way to avoid level_2 appearing. This is because the result of your grouping is a dataframe with several items in it: pandas is cool enough to understand your wish is to broadcast these items across the grouped keys, yet it is taking the index of the dataframe as an additional level to guarantee coherent output data. So dropping level=-1 at the end of your processing explicitly is expected.
If you want to avoid to reset that extra index, but still have some post processing, another way would be to call transform instead of apply, and get the returned data from fnc
being the entire group vector where you put np.nan
for results to exclude. Then, your dataframe will not have an extra level, but you'll need to call dropna()
afterwards.
Upvotes: 5