Reputation: 13
folks. I've been through all the questions related to sorting columns with pivot tables but couldn't find something just as I needed. I have a dataframe of this kind:
Date Moisture Accum Year DayOfYear
0 2000-01-01 0.408640 0.408640 2000 1
1 2000-01-02 0.433425 0.842065 2000 2
2 2000-01-03 0.429745 1.271810 2000 3
3 2000-01-04 0.427589 1.699399 2000 4
4 2000-01-05 0.428700 2.128098 2000 5
And I created a Pivot table from it and calculated another column from the existing data:
mean1 = pd.pivot_table(c1, index = 'DayOfYear', columns = 'Year', values = 'Moisture')
mean1['Mean'] = mean1.mean(axis = 1)
I obtained something like this:
Year 2000 2001 2002 ... 2018 2019 Mean
DayOfYear ...
1 0.408640 0.433016 0.420326 ... 0.423164 0.328385 0.401896
2 0.433425 0.423607 0.414502 ... 0.419587 0.322804 0.398434
3 0.429745 0.418132 0.404171 ... 0.417384 0.318795 0.396913
4 0.427589 0.407190 0.394478 ... 0.420361 0.316989 0.398425
5 0.428700 0.401072 0.386432 ... 0.417026 0.313664 0.396777
I want to sort the values for each year, but I haven't been able to make it happen. I've tried this:
mean1 = mean1.sort_values('2000', ascending = True, axis = 0)
But I get KeyError: '2000'
. I also tried sorting by the value I made the pivot table from ('Moisture'), as responses to other questions recommended, but it keeps showing a similar error.
If I try to sort the values from the 'Mean' column I do get the sorted column, but it can't be done (apparently) for the year columns (e.g. '2000').
What am I missing?
Upvotes: 1
Views: 1102
Reputation: 1501
If you want to sort every column independently, you may try this solution, which is the same as:
import numpy as np
mean2 = pd.DataFrame(np.sort(mean1.values, axis=0), index=mean1.index, columns=mean1.columns)
Which will give you a DataFrame with each column sorted individually.
Year 2000 2001 2002 2018 2019 Mean
DayOfYear
1 0.408640 0.401072 0.386432 0.417026 0.313664 0.389379
2 0.427589 0.407190 0.394478 0.417384 0.316989 0.393321
3 0.428700 0.418132 0.404171 0.419587 0.318795 0.397645
4 0.429745 0.423607 0.414502 0.420361 0.322804 0.402706
5 0.433425 0.433016 0.420326 0.423164 0.328385 0.402785
But now the index doesn't make sense at all, since all cells were reordered. So maybe you'll want to reindex it.
Upvotes: 0
Reputation: 30971
Year column in your source DataFrame is most likely of int type,
so the respective column in the pivot table has also "integer" (not
"string") name (run mean1.info()
to check column types).
So the first, mandatory correction is to change the first parameter to integer 2000.
Two another correction are in my opinion advisable, but not required: Default values of ascending and axis parameters are True and 0, respectively, so if you want to keep your code shorter, you can omit them.
So replace the offending line with:
mean1 = mean1.sort_values(2000)
Upvotes: 1