Reputation: 121
Taken the following csv input (not all data points included for obvious reasons):
"Date","Production"
"1962-01",589
"1962-02",561
...
"1975-11",797
"1975-12",843
I am trying to format the following data in a boxplot using group by months. But instead of showing 01 02 .. 11 12, I want it to show January, Feb... on the x label.
To do this, I have put the data into a dataframe and converted 'Date' into a pd.to_datetime. Then set it as an index.
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index("Date")
Then, I created a new column called 'Month' and 'Alph_Month'
df["Month"] = df.index.month
df["Alph_Months"] = df.index.strftime('%B')
At this point I have a dataset which looks the following:
Production Month Alph_Months
Date
1962-01-01 589 1 January
1962-02-01 561 2 February
1962-03-01 640 3 March
1962-04-01 656 4 April
To create a boxplot, I have tried the following:
df[['Production', 'Alph_Months']].boxplot(figsize=(16,6),by='Alph_Months', grid=True);
However, this seems to return the labels in alphabetical order (April, Aug, Dec...) instead of order of Jan, Feb, March etc.
Is there any way to have the boxplot to be sorted order values by Month column but label values set by Alph_Months column?
Upvotes: 1
Views: 1044
Reputation: 23
What you can try is using plt.xticks
assuming you have the line:
import matplotlib.pyplot as plt
The xticks
function allows you to rename the x ticks on your graph and so if you wanted to rename the boxplot ticks according to the month you can do something like:
plt.xticks([1, 2, 3, ...], ['Jan', 'Feb', ...])
You put this line after plotting your boxplot. I do notice that you are using the boxplot function for a DataFrame. I'm not sure if the plt.xticks
will work with that but it will definitely work with plotting boxplots in seaborn
and matplotlib
If you want to shorten it a bit you can replace the [1, 2, 3, ...] with range(1, 13)
Upvotes: 0