getting specific rows in a grouped dataframe pandas

Question

i have a pandas dataframe euc data which consists of columns

code1  code2 euclidean_distance

I wanted to get top 50 rows for every group of code1 sorted on euclidean distance, to get this i used:

matrix_top_50 = euc_data.sort_values(['code1', 'euclidean_distance'])
.groupby('code1').head(50).reset_index(drop=True)

Now i want to create another matrix to get the next 100 rows for every group of code1 sorted on euclidean distance

For that i tried to use .iloc

start = 51
end = 151
next_matrix = euc_data.sort_values(['code1', 'euclidean_distance'])
.groupby('code1').iloc[start:end].reset_index(drop=True)

But i am getting error:

Cannot access callable attribute 'iloc' of 'DataFrameGroupBy' objects, try using the 'apply' method

How can i achieve this?

jezrael · Accepted Answer

I think you need GroupBy.apply, but is necessary data have to contains rows by start and end, else error:

ext_matrix = (euc_data.sort_values(['code1', 'euclidean_distance'])
                      .groupby('code1')
                      .apply(lambda x: x.iloc[start:end])
                      .reset_index(drop=True) 
              )

getting specific rows in a grouped dataframe pandas

Answers (2)

Related Questions