Shubham R
Shubham R

Reputation: 7644

getting specific rows in a grouped dataframe pandas

i have a pandas dataframe euc data which consists of columns

code1  code2 euclidean_distance

I wanted to get top 50 rows for every group of code1 sorted on euclidean distance, to get this i used:

matrix_top_50 = euc_data.sort_values(['code1', 'euclidean_distance'])
.groupby('code1').head(50).reset_index(drop=True)

Now i want to create another matrix to get the next 100 rows for every group of code1 sorted on euclidean distance

For that i tried to use .iloc

start = 51
end = 151
next_matrix = euc_data.sort_values(['code1', 'euclidean_distance'])
.groupby('code1').iloc[start:end].reset_index(drop=True)

But i am getting error:

Cannot access callable attribute 'iloc' of 'DataFrameGroupBy' objects, try using the 'apply' method

How can i achieve this?

Upvotes: 2

Views: 75

Answers (2)

AndreyF
AndreyF

Reputation: 1838

Maybe there is a better solution but you can use apply as the error hints:

next_matrix = euc_data.sort_values(['code1', 'euclidean_distance'])\
    .groupby('code1').apply(lambda x: x.iloc[start:end]).\
    reset_index(drop=True)

Upvotes: 2

jezrael
jezrael

Reputation: 863791

I think you need GroupBy.apply, but is necessary data have to contains rows by start and end, else error:

ext_matrix = (euc_data.sort_values(['code1', 'euclidean_distance'])
                      .groupby('code1')
                      .apply(lambda x: x.iloc[start:end])
                      .reset_index(drop=True) 
              )

Upvotes: 1

Related Questions