user13412850
user13412850

Reputation: 519

python - how do I perform the below operation in dataframe

I had dataframe a part of which is shown below

df1 = pd.DataFrame({
                   'id1': ['676PA','676RA','197PA','197RA','199PA','199RA','834PA','834RA','950PA','950RA','952PA','952RA','953PA','953RA','954PA','954RA','956PA','956RA'],
                    'Category1' : ['P-L','FL123','P-L','FL123','P-L','FL123','P-L','FL123','FL123','P-L','P-L','FL123','FL123','P-L','FL123','P-L','P-L','FL123'],
                    'Val1' : [-4.0,39.0,-8.0,45.0,-4.0,27.0,-46.0,271.0,-70.0,3.0,-34.0,192.0,-56.0,3.0,-56.0,3.0,-52.0,292.0]
})

each id has two entries "PA" and "RA" with corresponding Category and Val as shown below

the sequencing of Category is incorrect and I want it to be consistent for all ids. I want P-L to come first followed by FL123 for each of the ids. I have shown the "current" and how I want to see the output below. Any help is much appreciated.

enter image description here

Upvotes: 0

Views: 65

Answers (1)

Shep
Shep

Reputation: 91

Edit - keep historical order of IDs

Since you want to keep the order of you IDs (so you want to maintain 676, 199 etc.) you need to create some sort of count that will count each ID with the same number (i.e 676R AND 676PA both equal 0, the next ID for both RA and PA will equal 1 and so on...).

What you can do is very similar, you just need a different temporary column:

  1. Create a new temp column using .groupby() and .cumcount() where you group on the Category1 so that it counts each category one after the other.
  2. Sort on this new new ID column and the Category1 column :)
(
    df1
    .assign(temp = df1.groupby('Category1').cumcount())
    .sort_values(['temp','Category1'], ascending=[True, False])
)

Note: This will only work if you have two categories for each ID

Original - If order retention isn't needed

Like said in the comments, if it does not matter in which order the IDs appear (i.e. 197PA, 197RA can come first (but together) in your entire list) then you can use a sort. However, since you need to sort using the digits only, you need to do this in two steps (optional third):

  1. Create a new column (say temp_id) where you replace 'PA' to be 'RA' (or vice-versa)
  2. Sort by new_id ascending and Category1 descending
  3. Optional: Remove the temp_id column as you no longer need it.
(
    df1.assign(temp_id = df1['id1'].str[:-2]) 
    .sort_values(['temp_id','Category1'], ascending=[True, False])
#     .drop('temp_id', axis=1)
)

Upvotes: 2

Related Questions