Reputation: 3
Using Iris dataset as an example I am interested in sorting by species order (so that it comes as 1,2,3,1,2,3...) then by descending order of petal length as illustrated in expected output.
% dist column, is calculated by the sum of petal length for each species divided by total petal length across whole dataframe.
As you can see that species order is higher priority than order of petal length and you may have more number in particular species name.
Expected output:
petal length (cm) | petal width (cm) | sepal length (cm) | sepal width (cm) | Species | % dist | Species order |
---|---|---|---|---|---|---|
6.2 | 3.5 | 5.3 | 6.5 | virginica | 0.19 | 1 |
5.6 | 2.0 | 6.6 | 3.0 | setosa | 0.39 | 2 |
9.1 | 7.9 | 7.7 | 5.2 | versicolor | 0.42 | 3 |
4.0 | 5.5 | 0.2 | 7.8 | virginica | 0.19 | 1 |
5.0 | 8.8 | 1.4 | 2.5 | setosa | 0.39 | 2 |
7.0 | 9.8 | 8.3 | 3.1 | versicolor | 0.42 | 3 |
4.3 | 1.3 | 5.9 | 2.8 | setosa | 0.39 | 2 |
3.2 | 7.6 | 5.6 | 7.9 | versicolor | 0.42 | 3 |
4.2 | 0.7 | 4.4 | 6.9 | setosa | 0.39 | 2 |
2.3 | 5.0 | 2.1 | 7.2 | versicolor | 0.42 | 3 |
1.5 | 0.2 | 5.0 | 3.4 | setosa | 0.39 | 2 |
0.5 | 0.4 | 7.3 | 9.0 | versicolor | 0.42 | 3 |
I have tried using sort_values() in python but it doesn't quite work out, as all species are sorted by species order and not in a rotate format as I would like. My actual dataframe has over a million rows so manually rotating it is not a feasible solution. Any suggestions on how this can be done in python would be great. Thank you!
Upvotes: 0
Views: 68
Reputation: 2605
IIUC correctly you're sorting by petal length descending and then you want to alternate each species in the data frame.
Loading the data frame from sklearn, this has a different order of species than your example, but the approach is similar
from sklearn import datasets
iris = datasets.load_iris()
df = pd.DataFrame(np.hstack((iris['data'], iris['target'].reshape(iris['target'].shape[0],1))),columns= iris['feature_names'] + ['order'])
Sort and then alternate rows:
final_df, order = [], []
# df = df.sample(frac=1) # shuffle the rows if needed
for name,group in df.groupby('order'):
group = group.sort_values(['petal length (cm)'], ascending=False)
group = group.reset_index().drop(columns='index')
final_df.append(group)
order.append(name)
# You can sort by whatever "order" you want
final_df = [final_df[int(i)] for i in order]
out = pd.concat(final_df).sort_index(kind='merge')
print(out[['petal length (cm)', 'order']].head(15))
Output:
petal length (cm) order
0 1.9 0.0
0 5.1 1.0
0 6.9 2.0
1 1.9 0.0
1 5.0 1.0
1 6.7 2.0
2 1.7 0.0
2 4.9 1.0
2 6.7 2.0
3 1.7 0.0
3 4.9 1.0
3 6.6 2.0
4 1.7 0.0
4 4.8 1.0
4 6.4 2.0
Upvotes: 0