Calculus
Calculus

Reputation: 781

How to plot multiple line charts from a Pandas data frames

I'm trying to make an array of line charts from a data frame like this

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 10000),
                    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 10000),
                    'TIME_BIN': np.random.randint(1, 86400, size=10000),
                    'COUNT': np.random.randint(1, 700, size=10000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)

         CITY  COUNT        DAY  TIME_BIN
0     ATLANTA    270  Wednesday  10:50:00
1     CHICAGO    375  Wednesday  12:20:00
2       MIAMI    490   Thursday  11:30:00
3       MIAMI    571     Sunday  23:30:00
4      DENVER    379   Saturday  07:30:00
...       ...    ...        ...       ...
9995  ATLANTA    107   Saturday  21:10:00
9996   DENVER    127    Tuesday  15:00:00
9997   DENVER    330     Friday  06:20:00
9998  PHOENIX    379   Saturday  19:50:00
9999  CHICAGO    628   Saturday  01:30:00

This is what I have right now:

piv = df.pivot(columns="DAY").plot(x='TIME_BIN', kind="Line", subplots=True)
plt.show()

enter image description here

But the x-axis formatting is messed up and I need each city to be its own line. How do I fix that? I'm thinking that I need to loop through each day of the week instead of trying to make an array in a single line. I've tried seaborn with no luck. To summarize, this is what I'm trying to achieve:

Upvotes: 0

Views: 2173

Answers (1)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339062

I don't see how pivoting helps here, since at the end you need to divide your data twice, once for the days of the week, which shall be put into several subplots, and again for the cities, which shall have their own colored line. At this point we're at the limit of what pandas can do with its plotting wrapper.

Matplotlib

Using matplotlib one can loop through the two categories, days and cities and just plot the data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates

df = pd.DataFrame({ 
    'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 10000),
    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 
                             'Friday', 'Saturday', 'Sunday'], 10000),
    'TIME_BIN': np.random.randint(1, 86400, size=10000),
    'COUNT': np.random.randint(1, 700, size=10000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min')


days = ['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
cities = np.unique(df["CITY"])
fig, axes = plt.subplots(nrows=len(days), figsize=(13,8), sharex=True)

# loop over days (one could use groupby here, but that would lead to days unsorted)
for i, day in enumerate(days):
    ddf = df[df["DAY"] == day].sort_values("TIME_BIN")
    # loop over cities
    for city in cities:
        dddf = ddf[ddf["CITY"] == city]
        axes[i].plot(dddf["TIME_BIN"], dddf["COUNT"], label=city)
    axes[i].margins(x=0)
    axes[i].set_title(day)


fmt = matplotlib.dates.DateFormatter("%H:%M") 
axes[-1].xaxis.set_major_formatter(fmt)   
axes[0].legend(bbox_to_anchor=(1.02,1))
fig.subplots_adjust(left=0.05,bottom=0.05, top=0.95,right=0.85, hspace=0.8)    
plt.show()

enter image description here

Seaborn

Roughly the same can be achived with a seaborn FacetGrid.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates
import seaborn as sns

df = pd.DataFrame({ 
    'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 10000),
    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 
                             'Friday', 'Saturday', 'Sunday'], 10000),
    'TIME_BIN': np.random.randint(1, 86400, size=10000),
    'COUNT': np.random.randint(1, 700, size=10000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min')

days = ['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
cities = np.unique(df["CITY"])

g = sns.FacetGrid(data=df.sort_values('TIME_BIN'), 
                  row="DAY", row_order=days, 
                  hue="CITY", hue_order=cities, sharex=True, aspect=5)
g.map(plt.plot, "TIME_BIN", "COUNT")

g.add_legend()
g.fig.subplots_adjust(left=0.05,bottom=0.05, top=0.95,hspace=0.8)
fmt = matplotlib.dates.DateFormatter("%H:%M")
g.axes[-1,-1].xaxis.set_major_formatter(fmt)
plt.show()

enter image description here

Upvotes: 3

Related Questions