Reputation: 143

Sort and arrange values in dataframe in a specific order

I have the following data in dataframe

Folder Name   ID     File Name        
Test          2      2_Accounts.csv      
Test          4      4_Contacts.csv         
Test          2      2_Contacts.csv
Test          4      4_Accounts.csv
Test          4      4_Leads.csv
Test          2      2_Leads.csv
Test          2      2_Opps.csv
Test          4      4_Opps.csv

I first need to sort by ID so ID's with same value are grouped together. I did the following

df = df.sort_values("ID")
Folder Name   ID     File Name        
Test          2      2_Leads.csv    
Test          2      2_Accounts.csv         
Test          2      2_Contacts.csv
Test          2      2_Opps.csv
Test          4      4_Accounts.csv
Test          4      4_Opps.csv
Test          4      4_Leads.csv
Test          4      4_Contacts.csv

Next I need to sort within groups(id) in a way that "Contacts" should come first, "Accounts" should come second, and the rest can follow in any order

The should output should look something like this

Folder Name   ID     File Name        
Test          2      2_Contacts.csv      
Test          2      2_Accounts.csv         
Test          2      2_Opps.csv
Test          2      2_Leads.csv
Test          4      4_Contacts.csv
Test          4      4_Accounts.csv
Test          4      4_Leads.csv
Test          4      4_Opps.csv

Upvotes: 1

Answers (3)

Corralien

Reputation: 120559

Create a temporary column to set custom order: 0 -> Contact, 1 -> Accounts, * -> the rest

df['order'] = df['File Name'].replace({r'Contact': 0,
                                       r'Accounts': 1,
                                       r'.*': 2}, regex=True)

out = df.sort_values(['ID', 'order']).drop(columns='order')

>>> out
  Folder Name  ID       File Name
2        Test   2  2_Contacts.csv
0        Test   2  2_Accounts.csv
5        Test   2     2_Leads.csv
6        Test   2      2_Opps.csv
1        Test   4  4_Contacts.csv
3        Test   4  4_Accounts.csv
4        Test   4     4_Leads.csv
7        Test   4      4_Opps.csv

Upvotes: 1

not_speshal

Reputation: 23166

Try assigning a pseudo column that forces "Contacts" to be ranked above others. Like so:

>>> df.assign(file_type=~df["File Name"].str.contains("Contacts")*1) \
      .sort_values(["ID", "file_type"]) \
      .drop("file_type",axis=1) \
      .reset_index(drop=True)

  Folder Name  ID       File Name
0        Test   2  2_Contacts.csv
1        Test   2  2_Accounts.csv
2        Test   2     2_Leads.csv
3        Test   2      2_Opps.csv
4        Test   4  4_Contacts.csv
5        Test   4  4_Accounts.csv
6        Test   4     4_Leads.csv
7        Test   4      4_Opps.csv

Create a column called "file_type" that assigns a value of 0 if the file has the word "Contacts" in it and 0 otherwise.
Sort the dataframe by columns "ID" and "file_type"
Drop the created column
Reset index

Upvotes: 0

Hammurabi

Reputation: 1179

You can sort on more than one column at a time to get what part way to you want.

df = df.sort_values(by=['ID', 'File Name'])

2_A comes before 2_C so you could create a new column that alters the File Name text to come up with the order you want, for example, create a temp column that strips out the 2_ and then delete the column if you don't need it.

df['sort file name col'] = df['File Name'].str.replace('\\A[0-9]_', '')
df = df.sort_values(by=['ID', 'sort file name col'])
del df['sort file name col']

Upvotes: 0

Sort and arrange values in dataframe in a specific order

Answers (3)

Related Questions