Reputation: 1351

Python Pandas, slice rows from group in .groupby().apply()

I have the following code setup that calls and groupBy and apply on a Python Pandas DataFrame.

The bizarre thing is I am unable to slice the grouped data by row (like df.loc[2:5]) without it completely screwing the output (as shown in the debug), how can you drop rows and get this to give the desired output?

Any help would be massively appreciated, I'm running this on a bigger example with more complicated functions, but have pinpointed the issues to the row slicing!

Code:

import pandas as pd
df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 2, 3, 4]})

def net_func(df):
    df_res = daily_func(df, True)
    df_res_valid = daily_func(df, False)
    df_merge = pd.merge(df_res, df_res_valid)
    return df_merge

def daily_func(df, bool_param):

#     df.drop(df.head(1).index, inplace=True)
#     df = df[1:1]
#     df.iloc[1:1,:]
#     df.loc[1:1,:]


    if bool_param:
        df['daily'+str(bool_param)] = 1
    else:
        df['daily'+str(bool_param)] = 0    
    return df

print df.groupby('one').apply(net_func)

Current output:

         one  two  dailyTrue  dailyFalse
one                                     
AAL  0   AAL    1          1           0
     1   AAL    2          1           0
AAPL 0  AAPL    1          1           0
     1  AAPL    2          1           0

Desired output:

         one  two  dailyTrue  dailyFalse
one                                     
AAL  1   AAL    2          1           0
AAPL 1  AAPL    2          1           0

Ideally, I would like to be able to slice by row for each group for example df.loc[3:5] - This would be perfect!

I've tried the commented as follows:

output with df.drop(df.head(1).index, inplace=True):

Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []

Update: also tried output with df = df[1:1]:

Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []

Update have also tried df.iloc[1:1,:]:

         one  two  dailyTrue  dailyFalse
one                                     
AAL  0   AAL    1          1           0
     1   AAL    2          1           0
AAPL 0  AAPL    1          1           0
     1  AAPL    2          1           0

and df.loc[1:1,:]:

         one  two  dailyTrue  dailyFalse
one                                     
AAL  0   AAL    1          1           0
     1   AAL    2          1           0
AAPL 0  AAPL    1          1           0
     1  AAPL    2          1           0

Upvotes: 5

Answers (3)

Wesley Cheek

Reputation: 1696

Just use an iloc in groupby.apply:

groupby.apply(lambda group: group.iloc[start:stop:step])

Upvotes: 3

Learning is a mess

Reputation: 8277

I felt the need for slicing inside GroupBy object and I have been doing so by applying this monkey patch:

def __groupby_slice( _grp, start=0, stop=None, step=1):
    '''
    Applies a slice to a GroupBy object
    '''
    return _grp.apply( lambda _df : _df.iloc[start:stop:step]).reset_index(drop=True)

pd.core.groupby.GroupBy.slice = __groupby_slice

Use as:

df.groupby('feature0').slice(-10, -3, 2)

Works with pandas==0.25.3

Upvotes: 4

Parfait

Reputation: 107652

Consider using the cross section slice, xs after the groupby().apply(), specifying each key accordingly:

print df.groupby('one').apply(net_func).xs(0, level=1)
#       one  two  dailyTrue  dailyFalse
#one                                   
#AAL    AAL    1          1           0
#AAPL  AAPL    1          1           0

print df.groupby('one').apply(net_func).xs(1, level=1)
#       one  two  dailyTrue  dailyFalse
#one                                   
#AAL    AAL    2          1           0
#AAPL  AAPL    2          1           0

Alternatively, use multiple indexing with list of tuples:

print df.groupby('one').apply(net_func).ix[[('AAL', 1), ('AAPL', 1)]]
#         one  two  dailyTrue  dailyFalse
#one                                     
#AAL  1   AAL    2          1           0
#AAPL 1  AAPL    2          1           0

Still more with slice (introduced in pandas 0.14):

print df.groupby('one').apply(net_func).loc[(slice('AAL','AAPL'),slice(1,1)),:]
#         one  two  dailyTrue  dailyFalse
#one                                     
#AAL  1   AAL    2          1           0
#AAPL 1  AAPL    2          1           0

Upvotes: 3

Python Pandas, slice rows from group in .groupby().apply()

Answers (3)

Related Questions