Reputation: 27593
I have a DataFrame, group and sum that by hour, which turns it into a Series. When I plot, that, the x-axis is completely garbled, unreadable.
Summarized in code:
bicycles = both_directions.query('type == "BICYCLE"')
display(bicycles.info())
timegroups = bicycles.groupby(pd.Grouper(key='date_time', axis=0, freq="1H", sort=True)).count()['date']
display(timegroups)
display(type(timegroups.index))
timegroups.plot(kind="bar", stacked=True)
Which outputs:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2025 entries, 0 to 3588
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date_time 2025 non-null datetime64[ns]
1 speed 2025 non-null int64
2 time 2025 non-null object
3 date 2025 non-null object
4 direction 2025 non-null int64
5 length 2025 non-null float64
6 length_norm 2025 non-null int64
7 speed_norm 2025 non-null int64
8 type 2025 non-null string
dtypes: datetime64[ns](1), float64(1), int64(4), object(2), string(1)
memory usage: 158.2+ KB
None
date_time
2022-06-01 14:00:00 1
2022-06-01 15:00:00 11
2022-06-01 16:00:00 3
2022-06-01 17:00:00 8
2022-06-01 18:00:00 2
..
2022-06-13 09:00:00 0
2022-06-13 10:00:00 5
2022-06-13 11:00:00 13
2022-06-13 12:00:00 12
2022-06-13 13:00:00 13
Freq: H, Name: date, Length: 288, dtype: int64
pandas.core.indexes.datetimes.DatetimeIndex
<matplotlib.axes._subplots.AxesSubplot at 0x7fcd133c3a90>
What is the way to (smartly) skip values so that X-axis labels remain readable?
According to Panda's documentation it should already do this automatically, using default behaviour.
Pandas includes automatically tick resolution adjustment for regular frequency time-series data.
But it is clear, it doesn't in this case. What am I doing wrong? Is there a setting or conversion I'm missing? Is it a type issue (series vs dataframe?)
Upvotes: 1
Views: 613
Reputation: 13518
Given the following toy dataframe mimicking yours (one value per hour during one year), but with duplicated values (each hour in date_time
column is repeated twice):
import random
import pandas as pd
df = pd.DataFrame(
{
"date_time": pd.to_datetime(
pd.date_range(start="1/1/2021", end="12/31/2021", freq="H"), unit="H"
)
}
)
df["count"] = [int(random.random() * 100) for _ in range(df.shape[0])]
df = pd.concat([df, df]).reset_index(drop=True) # Add duplicates
df.info()
# Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17474 entries, 0 to 17473
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date_time 17474 non-null datetime64[ns]
1 count 17474 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 273.2 KB
xlabels
are unreadable when plotting the dataframe as is (note that it would also be the case without duplicated values in date_time
column):
df.plot(x="date_time", kind="bar", stacked=True)
Output:
One way to fix that is to manually set xticks
, for instance, to the index of each month end day using pandas asfreq, and then set xlabels
accordingly by chain calling set_xticklabels with the corresponding datetime values; at each step, duplicated values are taken into account so that, even if all values are plotted in the end, ticks and labels remain unique:
# Index of end of months dates in df
df = df.sort_values(by="date_time").reset_index(drop=True)
end_of_months = (
df[
df["date_time"].isin(
df.drop_duplicates(subset=["date_time"])
.set_index(["date_time"])
.asfreq("M")
.reset_index()["date_time"]
)
]
.drop_duplicates(subset=["date_time"])
.index.tolist()
)
df.plot(x="date_time", kind="bar", stacked=True, xticks=end_of_months).set_xticklabels(
df.loc[end_of_months, "date_time"].dt.strftime("%Y-%m-%d").unique(),
rotation=45,
ha="right",
)
Output:
Upvotes: 1