MessitÖzil
MessitÖzil

Reputation: 1378

How to manually select which x-axis label(Dates) gets plotted in pandas

First of all I am sorry if I am not describing the problem correctly but the example should make my issue clear.

I have this dataframe and I need to plot it sorted by date, but I have lots of date (around 60), therefore pandas automatically chooses which date to plot(label) in x-axis and the dates are random. Due to visibility issue I too want to only plot selected dates in x-axis but I want it to have some pattern like january every year.

This is my code:

df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 32543]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
    df2.T[['entry','sum']].plot(rot = 30)
else:
    df2.T[['sum']].plot(kind = 'bar')
ax1 = plt.axes()
ax1.legend(["Seitenzugriffe", "Dateiabrufe"])
plt.xlabel("")
plt.savefig('image.png')

This is the plot

As you can see the plot has 2010-08, 2013-09, 2014-07 as the x-axis value. How can I make it something like 2010-01, 2013-01, 2014-01 e.t.c

Thank you very much, I know this is not the optimal description but since english is not my first language this is the best I could come up with.

Upvotes: 1

Views: 1010

Answers (1)

tdube
tdube

Reputation: 2553

NOTE: Updated to answer OP question more directly.

You are mixing Pandas plotting as well as the matplotlib PyPlot API and Object-oriented API by using axes (ax1 above) methods and plt methods. The latter are two distinctly different APIs and they may not work correctly when mixed. The matplotlib documentation recommends using the object-oriented API.

While it is easy to quickly generate plots with the matplotlib.pyplot module, we recommend using the object-oriented approach for more control and customization of your plots. See the methods in the matplotlib.axes.Axes() class for many of the same plotting functions. For examples of the OO approach to Matplotlib, see the API Examples.

Here's how you can control the x-axis "tick" values/labels using proper matplotlib date formatting (see matplotlib example) with the object-oriented API. Also, see link from @ImportanceOfBeingErnest answer to another question for incompatibilities between Pandas' and matplotlib's datetime objects.

# prepare your data
df = pd.read_csv('../../../so/dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df.head()
df1 = df[df['Resource_ID'] == 10021]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
    # convert your index to use pandas datetime format
    df3 = df2.T[['entry','sum']].copy()
    df3.index = pd.to_datetime(df3.index)
    # for illustration, I changed a couple dates and added some dummy values
    df3.loc['2014-01-01']['entry'] = 48
    df3.loc['2014-05-01']['entry'] = 28
    df3.loc['2015-05-01']['entry'] = 36
    print(df3)

    # plot your data
    fig, ax = plt.subplots()

    # use matplotlib date formatters
    years = mdates.YearLocator()   # every year
    yearsFmt = mdates.DateFormatter('%Y-%m')

    # format the major ticks
    ax.xaxis.set_major_locator(years)
    ax.xaxis.set_major_formatter(yearsFmt)

    ax.plot(df3)

    # add legend
    ax.legend(["Seitenzugriffe", "Dateiabrufe"])

    fig.savefig('image.png')
else:
    # left as an exercise...
    df2.T[['sum']].plot(kind = 'bar')

Upvotes: 1

Related Questions