Reputation: 143
I have the following data in dataframe
Folder Name ID File Name
Test 2 2_Accounts.csv
Test 4 4_Contacts.csv
Test 2 2_Contacts.csv
Test 4 4_Accounts.csv
Test 4 4_Leads.csv
Test 2 2_Leads.csv
Test 2 2_Opps.csv
Test 4 4_Opps.csv
I first need to sort by ID so ID's with same value are grouped together. I did the following
df = df.sort_values("ID")
Folder Name ID File Name
Test 2 2_Leads.csv
Test 2 2_Accounts.csv
Test 2 2_Contacts.csv
Test 2 2_Opps.csv
Test 4 4_Accounts.csv
Test 4 4_Opps.csv
Test 4 4_Leads.csv
Test 4 4_Contacts.csv
Next I need to sort within groups(id) in a way that "Contacts" should come first, "Accounts" should come second, and the rest can follow in any order
The should output should look something like this
Folder Name ID File Name
Test 2 2_Contacts.csv
Test 2 2_Accounts.csv
Test 2 2_Opps.csv
Test 2 2_Leads.csv
Test 4 4_Contacts.csv
Test 4 4_Accounts.csv
Test 4 4_Leads.csv
Test 4 4_Opps.csv
Upvotes: 1
Views: 600
Reputation: 120391
Create a temporary column to set custom order: 0 -> Contact, 1 -> Accounts, * -> the rest
df['order'] = df['File Name'].replace({r'Contact': 0,
r'Accounts': 1,
r'.*': 2}, regex=True)
out = df.sort_values(['ID', 'order']).drop(columns='order')
>>> out
Folder Name ID File Name
2 Test 2 2_Contacts.csv
0 Test 2 2_Accounts.csv
5 Test 2 2_Leads.csv
6 Test 2 2_Opps.csv
1 Test 4 4_Contacts.csv
3 Test 4 4_Accounts.csv
4 Test 4 4_Leads.csv
7 Test 4 4_Opps.csv
Upvotes: 1
Reputation: 23146
Try assigning a pseudo column that forces "Contacts" to be ranked above others. Like so:
>>> df.assign(file_type=~df["File Name"].str.contains("Contacts")*1) \
.sort_values(["ID", "file_type"]) \
.drop("file_type",axis=1) \
.reset_index(drop=True)
Folder Name ID File Name
0 Test 2 2_Contacts.csv
1 Test 2 2_Accounts.csv
2 Test 2 2_Leads.csv
3 Test 2 2_Opps.csv
4 Test 4 4_Contacts.csv
5 Test 4 4_Accounts.csv
6 Test 4 4_Leads.csv
7 Test 4 4_Opps.csv
Upvotes: 0
Reputation: 1169
You can sort on more than one column at a time to get what part way to you want.
df = df.sort_values(by=['ID', 'File Name'])
2_A comes before 2_C so you could create a new column that alters the File Name text to come up with the order you want, for example, create a temp column that strips out the 2_ and then delete the column if you don't need it.
df['sort file name col'] = df['File Name'].str.replace('\\A[0-9]_', '')
df = df.sort_values(by=['ID', 'sort file name col'])
del df['sort file name col']
Upvotes: 0