Reputation: 519
I had dataframe a part of which is shown below
df1 = pd.DataFrame({
'id1': ['676PA','676RA','197PA','197RA','199PA','199RA','834PA','834RA','950PA','950RA','952PA','952RA','953PA','953RA','954PA','954RA','956PA','956RA'],
'Category1' : ['P-L','FL123','P-L','FL123','P-L','FL123','P-L','FL123','FL123','P-L','P-L','FL123','FL123','P-L','FL123','P-L','P-L','FL123'],
'Val1' : [-4.0,39.0,-8.0,45.0,-4.0,27.0,-46.0,271.0,-70.0,3.0,-34.0,192.0,-56.0,3.0,-56.0,3.0,-52.0,292.0]
})
each id has two entries "PA" and "RA" with corresponding Category and Val as shown below
the sequencing of Category is incorrect and I want it to be consistent for all ids. I want P-L to come first followed by FL123 for each of the ids. I have shown the "current" and how I want to see the output below. Any help is much appreciated.
Upvotes: 0
Views: 65
Reputation: 91
Edit - keep historical order of IDs
Since you want to keep the order of you IDs (so you want to maintain 676, 199 etc.) you need to create some sort of count that will count each ID with the same number (i.e 676R AND 676PA both equal 0, the next ID for both RA and PA will equal 1 and so on...).
What you can do is very similar, you just need a different temporary column:
.groupby()
and .cumcount()
where you group on the Category1
so that it counts each category one after the other.Category1
column :)(
df1
.assign(temp = df1.groupby('Category1').cumcount())
.sort_values(['temp','Category1'], ascending=[True, False])
)
Note: This will only work if you have two categories for each ID
Original - If order retention isn't needed
Like said in the comments, if it does not matter in which order the IDs appear (i.e. 197PA, 197RA can come first (but together) in your entire list) then you can use a sort. However, since you need to sort using the digits only, you need to do this in two steps (optional third):
temp_id
) where you replace 'PA' to be 'RA' (or vice-versa)new_id
ascending and Category1
descendingtemp_id
column as you no longer need it.(
df1.assign(temp_id = df1['id1'].str[:-2])
.sort_values(['temp_id','Category1'], ascending=[True, False])
# .drop('temp_id', axis=1)
)
Upvotes: 2