Reputation: 535
I have a dataframe with a column that I want to groupby and sort by a column value. After groupby I find there are multiple indexes, one of them is the index of origin dataframe, I want to delete this index.
A sample data frame:
> d = pd.DataFrame(np.array([[0, 0, 1, 1, 2, 2, 2],
[3, 4, 5, 6, 7, 8, 9],
[1.25, 10.1, 2.3, 2.4, 1.2, 5.5, 5.7]]).T,
columns=['a', 'b', 'c'])
> d
a b c
0 0.0 3.0 1.25
1 0.0 4.0 10.10
2 1.0 5.0 2.30
3 1.0 6.0 2.40
4 2.0 7.0 1.20
5 2.0 8.0 5.50
6 2.0 9.0 5.70
Function I want to apply:
def top_all(df,column='b'):
return df.sort_index(by=column,ascending=True)
How I use groupby:
d.groupby('a').apply(top_all)
The result I got:
a b c
a
0.0 0 0.0 3.0 1.25
1 0.0 4.0 10.10
1.0 2 1.0 5.0 2.30
3 1.0 6.0 2.40
2.0 4 2.0 7.0 1.20
5 2.0 8.0 5.50
6 2.0 9.0 5.70
I want to get result like this:
a b c
a
0.0 0.0 3.0 1.25
0.0 4.0 10.10
1.0 1.0 5.0 2.30
1.0 6.0 2.40
2.0 2.0 7.0 1.20
2.0 8.0 5.50
2.0 9.0 5.70
UPDATED:
I tried reset_index
with level, but the result didn't contain level. The result I want is in groupby
format, which means the value of column a
should be split in different group in index. I don't know if I make it clear...
> d.groupby('a').apply(top_all).reset_index(level=1, drop=True)
> d
a b c
a
0.0 0.0 3.0 1.25
0.0 0.0 4.0 10.10
1.0 1.0 5.0 2.30
1.0 1.0 6.0 2.40
2.0 2.0 7.0 1.20
2.0 2.0 8.0 5.50
2.0 2.0 9.0 5.70
Upvotes: 3
Views: 4452
Reputation: 14113
The only thing I can think of to accomplish this task would be to use openpyxl
. First save the output to excel with the multi-index using pandas
then delete the column using openpyxl
to maintain the format you are looking for.
# export multi-index DataFrame to excel
d.groupby('a').apply(top_all).to_excel('python/test.xlsx')
import openpyxl
# open xlsx doc
book = openpyxl.load_workbook('python/test.xlsx')
# use active sheet
sheet = book.active
# delete col
sheet.delete_cols(2)
#save book
book.save('python/test.xlsx')
If you do not want the index name displayed in excel (you will currently have duplicate columns: 'a'):
group = d.groupby('a').apply(top_all)
group.index.names = [None, None] # set index names to None
group.to_excel('python/test.xlsx')
# open xlsx doc
book = openpyxl.load_workbook('python/test.xlsx')
# use active sheet
sheet = book.active
# delete col
sheet.delete_cols(2)
#save book
book.save('python/test.xlsx')
Upvotes: 1