elksie5000
elksie5000

Reputation: 7772

How to create groupby subplots in Pandas?

I've got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I'd like to perform a groupby plot on the dataframe so that it's possible to explore trends in crime over time.

    Offence                     Rolling year total number of offences       Month
0   Criminal damage and arson   1001                                        2003-03-31
1   Drug offences               66                                         2003-03-31
2   All other theft offences    617                                   2003-03-31
3   Bicycle theft               92                                    2003-03-31
4   Domestic burglary           282                                   2003-03-31

I've got some code which does the job, but it's a bit clumsy and it loses the time series formatting that Pandas delivers on a single plot. (I've included an image to illustrate). Can anyone suggest an idiom for such plots that I can use?

I would turn to Seaborn but I can't work out how to format the xlabel as timeseries.

subs = []
for idx, (i, g) in enumerate(df.groupby("Offence")):
        subs.append({"data": g.set_index("Month").resample("QS-APR", how="sum").ix["2010":],
                     "title":i})

ax = plt.figure(figsize=(25,15))
for i,g in enumerate(subs):
    plt.subplot(5, 5, i)
    plt.plot(g['data'])
    plt.title(g['title'])
    plt.xlabel("Time")
    plt.ylabel("No. of crimes")
    plt.tight_layout()

Upvotes: 7

Views: 11773

Answers (3)

tdy
tdy

Reputation: 41417

The idiomatic way to plot groups in pandas is actually to pivot the groups into columns (not groupby):

table = df.pivot_table(index='Month', columns='Offence', values='Total', aggfunc=sum)

# Offence     Arson  Bicycle  Burglary  Drug  Theft  Violent
# Month                                                     
# 2020-01-31    525      435       303   363    653      762
# 2020-02-29    180      146       438   177    498      381
# 2020-03-31    375      546       421   485    233      277
# ...
# 2021-12-31    329      344       490   550    592      385

Pandas can then automatically generate subplots per column in a given layout:

table.plot(subplots=True, layout=(3, 2))

df.plot subplots


I would turn to seaborn but I can't work out how to format the xlabel as timeseries.

For seaborn, keep the data in long form without pivoting. Assuming the dates have already been converted to_datetime, format the tick labels using autofmt_xdate:

g = sns.relplot(kind='line', data=df, x='Month', y='Total',
                hue='Offence', col='Offence', estimator=sum,
                col_wrap=2, height=2, aspect=2, legend=False)
g.fig.autofmt_xdate()

sns.relplot

Upvotes: 2

Nipun Batra
Nipun Batra

Reputation: 11387

Altair can work great in such cases.

import matplotlib.pyplot as plt
import pandas as pd
import quandl as ql

df = ql.get(["NSE/OIL.1", "WIKI/AAPL.1"], start_date="2013-1-1")
df.columns = ['OIL', 'AAPL']
df['year'] = df.index.year

from altair import *

Viz #1- No color by year/No columns by year

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL').configure_cell(width=200, height=150)

enter image description here

Viz #2- No color by year/columns by year

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', column='year').configure_cell(width=140, height=70).configure_facet_cell(strokeWidth=0)

enter image description here

Viz #3- Color by year

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', color='year:N').configure_cell(width=140, height=70)

enter image description here

Upvotes: 2

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25249

This is a reproducible example of 6 scatterplots in Pandas, obtained from pd.groupby() for 6 consecutive years. On x axis -- there is oil price (brent) for the year, on y -- the value for sp500 for the same year.

import matplotlib.pyplot as plt
import pandas as pd
import Quandl as ql
%matplotlib inline

brent = ql.get('FRED/DCOILBRENTEU')
sp500 = ql.get('YAHOO/INDEX_GSPC')
values = pd.DataFrame({'brent':brent.VALUE, 'sp500':sp500.Close}).dropna()["2009":"2015"]

fig, axes = plt.subplots(2,3, figsize=(15,5))
for (year, group), ax in zip(values.groupby(values.index.year), axes.flatten()):
    group.plot(x='brent', y='sp500', kind='scatter', ax=ax, title=year)

This produces the below plot:

enter image description here

(Just in case, from these plots you may infer there was a strong correlation between oil and sp500 in 2010 but not in other years).

You may change kind in group.plot() so that it suits your specific kind or data. My anticipation, pandas will preserve the date formatting for x-axis if you have it in your data.

Upvotes: 14

Related Questions