Reputation: 1351
I have the following code setup that calls and groupBy and apply on a Python Pandas DataFrame.
The bizarre thing is I am unable to slice the grouped data by row (like df.loc[2:5]
) without it completely screwing the output (as shown in the debug), how can you drop rows and get this to give the desired output?
Any help would be massively appreciated, I'm running this on a bigger example with more complicated functions, but have pinpointed the issues to the row slicing!
Code:
import pandas as pd
df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 2, 3, 4]})
def net_func(df):
df_res = daily_func(df, True)
df_res_valid = daily_func(df, False)
df_merge = pd.merge(df_res, df_res_valid)
return df_merge
def daily_func(df, bool_param):
# df.drop(df.head(1).index, inplace=True)
# df = df[1:1]
# df.iloc[1:1,:]
# df.loc[1:1,:]
if bool_param:
df['daily'+str(bool_param)] = 1
else:
df['daily'+str(bool_param)] = 0
return df
print df.groupby('one').apply(net_func)
Current output:
one two dailyTrue dailyFalse
one
AAL 0 AAL 1 1 0
1 AAL 2 1 0
AAPL 0 AAPL 1 1 0
1 AAPL 2 1 0
Desired output:
one two dailyTrue dailyFalse
one
AAL 1 AAL 2 1 0
AAPL 1 AAPL 2 1 0
Ideally, I would like to be able to slice by row for each group for example df.loc[3:5]
- This would be perfect!
I've tried the commented as follows:
output with df.drop(df.head(1).index, inplace=True)
:
Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []
Update: also tried output with df = df[1:1]
:
Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []
Update have also tried df.iloc[1:1,:]
:
one two dailyTrue dailyFalse
one
AAL 0 AAL 1 1 0
1 AAL 2 1 0
AAPL 0 AAPL 1 1 0
1 AAPL 2 1 0
and df.loc[1:1,:]
:
one two dailyTrue dailyFalse
one
AAL 0 AAL 1 1 0
1 AAL 2 1 0
AAPL 0 AAPL 1 1 0
1 AAPL 2 1 0
Upvotes: 5
Views: 18969
Reputation: 1696
Just use an iloc in groupby.apply:
groupby.apply(lambda group: group.iloc[start:stop:step])
Upvotes: 3
Reputation: 8277
I felt the need for slicing inside GroupBy
object and I have been doing so by applying this monkey patch:
def __groupby_slice( _grp, start=0, stop=None, step=1):
'''
Applies a slice to a GroupBy object
'''
return _grp.apply( lambda _df : _df.iloc[start:stop:step]).reset_index(drop=True)
pd.core.groupby.GroupBy.slice = __groupby_slice
Use as:
df.groupby('feature0').slice(-10, -3, 2)
Works with pandas==0.25.3
Upvotes: 4
Reputation: 107652
Consider using the cross section slice, xs
after the groupby().apply()
, specifying each key accordingly:
print df.groupby('one').apply(net_func).xs(0, level=1)
# one two dailyTrue dailyFalse
#one
#AAL AAL 1 1 0
#AAPL AAPL 1 1 0
print df.groupby('one').apply(net_func).xs(1, level=1)
# one two dailyTrue dailyFalse
#one
#AAL AAL 2 1 0
#AAPL AAPL 2 1 0
Alternatively, use multiple indexing with list of tuples:
print df.groupby('one').apply(net_func).ix[[('AAL', 1), ('AAPL', 1)]]
# one two dailyTrue dailyFalse
#one
#AAL 1 AAL 2 1 0
#AAPL 1 AAPL 2 1 0
Still more with slice (introduced in pandas 0.14):
print df.groupby('one').apply(net_func).loc[(slice('AAL','AAPL'),slice(1,1)),:]
# one two dailyTrue dailyFalse
#one
#AAL 1 AAL 2 1 0
#AAPL 1 AAPL 2 1 0
Upvotes: 3