How to plot data with gaps into subplots

Question

I have a dataframe with gaps

                                    temperature 
    data                                                        
    2016-01-01 01:00:00              -8.2 
    2016-01-01 02:00:00              -8.3  
    2016-01-01 03:00:00              -9.1 
    2016-01-01 04:00:00              -9.1  
    2016-01-01 05:00:00              -9.6 
        ...                           ...     
    2020-02-29 20:00:00               5.9   
    2020-02-29 21:00:00               5.4   
    2020-02-29 22:00:00               4.7 
    2020-02-29 23:00:00               4.3 
    2020-03-01 00:00:00               4.3

Here is the code for some sample data, different from mine but the concept is the same:

def tworzeniedaty():
    import pandas as pd
    rng1 = list(pd.date_range(start='2016-01-01', end='2016-02-29', freq='D'))
    rng2 = list(pd.date_range(start='2016-12-15', end='2017-02-28', freq='D'))
    rng3 = list(pd.date_range(start='2017-12-15', end='2018-02-28', freq='D'))
    rng4 = list(pd.date_range(start='2018-12-15', end='2019-02-28', freq='D'))
    rng5 = list(pd.date_range(start='2019-12-15', end='2020-02-29', freq='D'))
    return rng1 + rng2 + rng3 + rng4 + rng5


import random
import pandas as pd

lista = [random.randrange(1, 10, 1) for i in range(len(tworzeniedaty()))]
df = pd.DataFrame({'Date': tworzeniedaty(), 'temperature': lista})
df['Date'] = pd.to_datetime(df['Date'], format="%Y/%m/%d")

When I plot the data I get a very messy plot.

Instead I would like to get:

It is the same question as How to plot only specific months in a time series of several years? but I would like to do it in python and can't decipher R code.

Mr. T · Accepted Answer

We can group the data by calculating the difference between dates and checking if it exceeds a limit like three months:

from matplotlib import pyplot as plt
import random
import pandas as pd

def tworzeniedaty():
    rng1 = list(pd.date_range(start='2016-01-01', end='2016-02-29', freq='D'))
    rng2 = list(pd.date_range(start='2016-12-15', end='2017-02-28', freq='D'))
    rng3 = list(pd.date_range(start='2017-12-15', end='2018-02-28', freq='D'))
    rng4 = list(pd.date_range(start='2018-12-15', end='2019-02-28', freq='D'))
    rng5 = list(pd.date_range(start='2019-12-15', end='2020-02-29', freq='D'))
    return rng1 + rng2 + rng3 + rng4 + rng5

lista = [random.randrange(1, 10, 1) for i in range(len(tworzeniedaty()))]
df = pd.DataFrame({'Date': tworzeniedaty(), 'temperature': lista})


#assuming that the df is sorted by date, we look for gaps of more than 3 months
#then we label the groups with consecutive numbers
df["groups"] = (df["Date"].dt.month.diff() > 3).cumsum()
n = 1 + df["groups"].max()

#creating the desired number of subplots
fig, axes = plt.subplots(1, n, figsize=(15, 5), sharey=True)

#plotting each group into a subplot
for (i, group_df), ax in zip(df.groupby("groups"), axes.flat):
    ax.plot(group_df["Date"], group_df["temperature"])
      
fig.autofmt_xdate(rotation=45)    
plt.tight_layout()
plt.show()

Sample output:

Obviously, some fine-tuning is necessary if more groups should exist. In this case, a grid would be appropriate - one can create a subplot grid and remove unnecessary subplots like in this matplotlib example. The x-labels probably also need some adjustment with a matplotlib Locator and Formatter for better appearance. Some of this can be automated using the grouping variable with hue in seaborn; however, this may lead to a different set of problems.

How to plot data with gaps into subplots

Answers (2)

Related Questions