CrazyCSGuy
CrazyCSGuy

Reputation: 143

Sort and arrange values in dataframe in a specific order

I have the following data in dataframe

Folder Name   ID     File Name        
Test          2      2_Accounts.csv      
Test          4      4_Contacts.csv         
Test          2      2_Contacts.csv
Test          4      4_Accounts.csv
Test          4      4_Leads.csv
Test          2      2_Leads.csv
Test          2      2_Opps.csv
Test          4      4_Opps.csv

I first need to sort by ID so ID's with same value are grouped together. I did the following

df = df.sort_values("ID")
Folder Name   ID     File Name        
Test          2      2_Leads.csv    
Test          2      2_Accounts.csv         
Test          2      2_Contacts.csv
Test          2      2_Opps.csv
Test          4      4_Accounts.csv
Test          4      4_Opps.csv
Test          4      4_Leads.csv
Test          4      4_Contacts.csv

Next I need to sort within groups(id) in a way that "Contacts" should come first, "Accounts" should come second, and the rest can follow in any order

The should output should look something like this

Folder Name   ID     File Name        
Test          2      2_Contacts.csv      
Test          2      2_Accounts.csv         
Test          2      2_Opps.csv
Test          2      2_Leads.csv
Test          4      4_Contacts.csv
Test          4      4_Accounts.csv
Test          4      4_Leads.csv
Test          4      4_Opps.csv

Upvotes: 1

Views: 600

Answers (3)

Corralien
Corralien

Reputation: 120391

Create a temporary column to set custom order: 0 -> Contact, 1 -> Accounts, * -> the rest

df['order'] = df['File Name'].replace({r'Contact': 0,
                                       r'Accounts': 1,
                                       r'.*': 2}, regex=True)

out = df.sort_values(['ID', 'order']).drop(columns='order')
>>> out
  Folder Name  ID       File Name
2        Test   2  2_Contacts.csv
0        Test   2  2_Accounts.csv
5        Test   2     2_Leads.csv
6        Test   2      2_Opps.csv
1        Test   4  4_Contacts.csv
3        Test   4  4_Accounts.csv
4        Test   4     4_Leads.csv
7        Test   4      4_Opps.csv

Upvotes: 1

not_speshal
not_speshal

Reputation: 23146

Try assigning a pseudo column that forces "Contacts" to be ranked above others. Like so:

>>> df.assign(file_type=~df["File Name"].str.contains("Contacts")*1) \
      .sort_values(["ID", "file_type"]) \
      .drop("file_type",axis=1) \
      .reset_index(drop=True)

  Folder Name  ID       File Name
0        Test   2  2_Contacts.csv
1        Test   2  2_Accounts.csv
2        Test   2     2_Leads.csv
3        Test   2      2_Opps.csv
4        Test   4  4_Contacts.csv
5        Test   4  4_Accounts.csv
6        Test   4     4_Leads.csv
7        Test   4      4_Opps.csv
  1. Create a column called "file_type" that assigns a value of 0 if the file has the word "Contacts" in it and 0 otherwise.
  2. Sort the dataframe by columns "ID" and "file_type"
  3. Drop the created column
  4. Reset index

Upvotes: 0

Hammurabi
Hammurabi

Reputation: 1169

You can sort on more than one column at a time to get what part way to you want.

df = df.sort_values(by=['ID', 'File Name'])

2_A comes before 2_C so you could create a new column that alters the File Name text to come up with the order you want, for example, create a temp column that strips out the 2_ and then delete the column if you don't need it.

df['sort file name col'] = df['File Name'].str.replace('\\A[0-9]_', '')
df = df.sort_values(by=['ID', 'sort file name col'])
del df['sort file name col']

Upvotes: 0

Related Questions