Reputation: 903
I'm working with a big dataset within Excel in which I'm trying to sort a number by top 25 per index value.
The datasite looks like this:
The Final PAC ID is the company number and changes (this does not show in the given data). The PAC contribution is the number I want to sort by.
So for example, there will be 50 contributions done by company C00003590, to different candidates with amount 'PAC contribution', I would like to sort the top 25 contributions done per company.
I've tried working with dictionaries, creating a dictionary for each company and adding in the candidate numbers as a string key, and the contribution as a value.
The code I have so far is the following (this might be the completely wrong way to go about it though):
import pandas as pd
df1 = pd.read_excel('Test2.xlsx')
dict_company = {}
k1 = str(df1['Final PAC ID'])
k2 = str(df1['Candidate ID'])
for each in range(0,100):
dict_company[k1)[each]] = {}
dict_company[k1)[each]] = k2[each]
if each % 50 == 0:
print(each)
print(dict_company)
for each in range(0,100):
dict_company[k1][k2][each] = round(float(k1[each]))
if each % 50:
print(each)
print(dict_company)
Upvotes: 3
Views: 118
Reputation: 862611
I think you need nlargest
:
df1 = df.groupby('Final PAC ID')['PAC contribution'].nlargest(50)
If need all columns:
cols = df.columns[~df.columns.isin(['PAC contribution','Final PAC ID'])].tolist()
df1 = df.set_index(cols)
.groupby('Final PAC ID')['PAC contribution']
.nlargest(50)
.reset_index()
Another solution (can be slowier):
df1 = df.sort_values('PAC contribution', ascending=False).groupby('Final PAC ID').head(50)
Last save to excel by to_excel
:
df1.to_excel('filename.xlsx')
Upvotes: 2
Reputation: 6663
You can use groupby
in conjunction with a dictionary comprehension here. The result
is a dictionary containing your company names as keys and the sub dataframes with top 25 payments as values:
def aggregate(sub_df):
return sub_df.sort_values('PAC contribution', ascending=False).head(25)
grouped = df.groupby('Final PAC ID')
results = {company: aggregate(sub_df)
for company, sub_df in grouped}
Upvotes: 0