berkes
berkes

Reputation: 27593

Skip intermediate datetime values on X-axis in Pandas render

I have a DataFrame, group and sum that by hour, which turns it into a Series. When I plot, that, the x-axis is completely garbled, unreadable.

Summarized in code:

bicycles = both_directions.query('type == "BICYCLE"')
display(bicycles.info())

timegroups = bicycles.groupby(pd.Grouper(key='date_time', axis=0, freq="1H", sort=True)).count()['date']
display(timegroups)
display(type(timegroups.index))
timegroups.plot(kind="bar", stacked=True)

Which outputs:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2025 entries, 0 to 3588
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   date_time    2025 non-null   datetime64[ns]
 1   speed        2025 non-null   int64         
 2   time         2025 non-null   object        
 3   date         2025 non-null   object        
 4   direction    2025 non-null   int64         
 5   length       2025 non-null   float64       
 6   length_norm  2025 non-null   int64         
 7   speed_norm   2025 non-null   int64         
 8   type         2025 non-null   string        
dtypes: datetime64[ns](1), float64(1), int64(4), object(2), string(1)
memory usage: 158.2+ KB

None

date_time
2022-06-01 14:00:00     1
2022-06-01 15:00:00    11
2022-06-01 16:00:00     3
2022-06-01 17:00:00     8
2022-06-01 18:00:00     2
                       ..
2022-06-13 09:00:00     0
2022-06-13 10:00:00     5
2022-06-13 11:00:00    13
2022-06-13 12:00:00    12
2022-06-13 13:00:00    13
Freq: H, Name: date, Length: 288, dtype: int64

pandas.core.indexes.datetimes.DatetimeIndex

<matplotlib.axes._subplots.AxesSubplot at 0x7fcd133c3a90>

Garbled X-Axis output from matplotlib

What is the way to (smartly) skip values so that X-axis labels remain readable?

According to Panda's documentation it should already do this automatically, using default behaviour.

Pandas includes automatically tick resolution adjustment for regular frequency time-series data.

But it is clear, it doesn't in this case. What am I doing wrong? Is there a setting or conversion I'm missing? Is it a type issue (series vs dataframe?)

Upvotes: 1

Views: 613

Answers (1)

Laurent
Laurent

Reputation: 13518

Given the following toy dataframe mimicking yours (one value per hour during one year), but with duplicated values (each hour in date_time column is repeated twice):

import random

import pandas as pd

df = pd.DataFrame(
    {
        "date_time": pd.to_datetime(
            pd.date_range(start="1/1/2021", end="12/31/2021", freq="H"), unit="H"
        )
    }
)
df["count"] = [int(random.random() * 100) for _ in range(df.shape[0])]
df = pd.concat([df, df]).reset_index(drop=True)  # Add duplicates
df.info()
# Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17474 entries, 0 to 17473
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date_time  17474 non-null  datetime64[ns]
 1   count      17474 non-null  int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 273.2 KB

xlabels are unreadable when plotting the dataframe as is (note that it would also be the case without duplicated values in date_time column):

df.plot(x="date_time", kind="bar", stacked=True)

Output:

enter image description here

One way to fix that is to manually set xticks, for instance, to the index of each month end day using pandas asfreq, and then set xlabels accordingly by chain calling set_xticklabels with the corresponding datetime values; at each step, duplicated values are taken into account so that, even if all values are plotted in the end, ticks and labels remain unique:

# Index of end of months dates in df
df = df.sort_values(by="date_time").reset_index(drop=True)
end_of_months = (
    df[
        df["date_time"].isin(
            df.drop_duplicates(subset=["date_time"])
            .set_index(["date_time"])
            .asfreq("M")
            .reset_index()["date_time"]
        )
    ]
    .drop_duplicates(subset=["date_time"])
    .index.tolist()
)

df.plot(x="date_time", kind="bar", stacked=True, xticks=end_of_months).set_xticklabels(
    df.loc[end_of_months, "date_time"].dt.strftime("%Y-%m-%d").unique(),
    rotation=45,
    ha="right",
)

Output:

enter image description here

Upvotes: 1

Related Questions