Reputation: 7772
I've got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I'd like to perform a groupby plot on the dataframe so that it's possible to explore trends in crime over time.
Offence Rolling year total number of offences Month
0 Criminal damage and arson 1001 2003-03-31
1 Drug offences 66 2003-03-31
2 All other theft offences 617 2003-03-31
3 Bicycle theft 92 2003-03-31
4 Domestic burglary 282 2003-03-31
I've got some code which does the job, but it's a bit clumsy and it loses the time series formatting that Pandas delivers on a single plot. (I've included an image to illustrate). Can anyone suggest an idiom for such plots that I can use?
I would turn to Seaborn but I can't work out how to format the xlabel as timeseries.
subs = []
for idx, (i, g) in enumerate(df.groupby("Offence")):
subs.append({"data": g.set_index("Month").resample("QS-APR", how="sum").ix["2010":],
"title":i})
ax = plt.figure(figsize=(25,15))
for i,g in enumerate(subs):
plt.subplot(5, 5, i)
plt.plot(g['data'])
plt.title(g['title'])
plt.xlabel("Time")
plt.ylabel("No. of crimes")
plt.tight_layout()
Upvotes: 7
Views: 11773
Reputation: 41417
The idiomatic way to plot groups in pandas is actually to pivot the groups into columns (not groupby):
table = df.pivot_table(index='Month', columns='Offence', values='Total', aggfunc=sum)
# Offence Arson Bicycle Burglary Drug Theft Violent
# Month
# 2020-01-31 525 435 303 363 653 762
# 2020-02-29 180 146 438 177 498 381
# 2020-03-31 375 546 421 485 233 277
# ...
# 2021-12-31 329 344 490 550 592 385
Pandas can then automatically generate subplots per column in a given layout:
table.plot(subplots=True, layout=(3, 2))
I would turn to seaborn but I can't work out how to format the xlabel as timeseries.
For seaborn, keep the data in long form without pivoting. Assuming the dates have already been converted to_datetime
, format the tick labels using autofmt_xdate
:
g = sns.relplot(kind='line', data=df, x='Month', y='Total',
hue='Offence', col='Offence', estimator=sum,
col_wrap=2, height=2, aspect=2, legend=False)
g.fig.autofmt_xdate()
Upvotes: 2
Reputation: 11387
Altair can work great in such cases.
import matplotlib.pyplot as plt
import pandas as pd
import quandl as ql
df = ql.get(["NSE/OIL.1", "WIKI/AAPL.1"], start_date="2013-1-1")
df.columns = ['OIL', 'AAPL']
df['year'] = df.index.year
from altair import *
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL').configure_cell(width=200, height=150)
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', column='year').configure_cell(width=140, height=70).configure_facet_cell(strokeWidth=0)
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', color='year:N').configure_cell(width=140, height=70)
Upvotes: 2
Reputation: 25249
This is a reproducible example of 6 scatterplots in Pandas, obtained from pd.groupby()
for 6 consecutive years. On x axis -- there is oil price (brent) for the year, on y -- the value for sp500 for the same year.
import matplotlib.pyplot as plt
import pandas as pd
import Quandl as ql
%matplotlib inline
brent = ql.get('FRED/DCOILBRENTEU')
sp500 = ql.get('YAHOO/INDEX_GSPC')
values = pd.DataFrame({'brent':brent.VALUE, 'sp500':sp500.Close}).dropna()["2009":"2015"]
fig, axes = plt.subplots(2,3, figsize=(15,5))
for (year, group), ax in zip(values.groupby(values.index.year), axes.flatten()):
group.plot(x='brent', y='sp500', kind='scatter', ax=ax, title=year)
This produces the below plot:
(Just in case, from these plots you may infer there was a strong correlation between oil and sp500 in 2010 but not in other years).
You may change kind
in group.plot()
so that it suits your specific kind or data. My anticipation, pandas will preserve the date formatting for x-axis if you have it in your data.
Upvotes: 14