JayEstrera
JayEstrera

Reputation: 51

pandas data frame plotting in subplots

I have the following pandas data frame and would like to create n plots horizontally where n = unique labels(l1,l2,.) in the a1 row(for example in the following example there will be two plots because of l1 and l2). Then for these two plots, each plot will plot a4 as the x-axis against a3 as y axis. For example, ax[0] will contain a graph for a1, where it has three lines, linking the points [(1,15)(2,20)],[(1,17)(2,19)],[(1,23)(2,15)] for the below data.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
    d = {'a1': ['l1','l1','l1','l1','l1','l1','l2','l2','l2','l2','l2','l2'],
         'a2': ['a', 'a', 'b','b','c','c','d','d','e','e','f','f'],
         'a3': [15,20,17,19,23,15,22,21,23,23,24,27],
         'a4': [1,2,1,2,1,2,1,2,1,2,1,2]}
    
    df=pd.DataFrame(d)
    df
    a1  a2  a3  a4
    1   a   15  1 
    1   a   20  2
    1   b   17  1
    1   b   19  2
    1   c   23  1
    1   c   15  2
    2   d   22  1
    2   d   21  2
    2   e   23  1
    2   e   23  2
    2   f   24  1
    2   f   27  2

I currently have the following:

def graph(dataframe):
    x = dataframe["a4"]
    y = dataframe["a3"]
    ax[0].plot(x,y) #how do I plot and set the title for each group in their respective subplot without the use of for-loop?
    
fig, ax = plt.subplots(1,len(pd.unique(df["a1"])),sharey='row',figsize=(15,2))
df.groupby(["a1"]).apply(graph)

However, my above attempt only plots all a3 against a4 on the first subplot(because I wrote ax[0].plot()). I can always use a for-loop to accomplish the desired task, but for large number of unique groups in a1, it will be computationally expensive. Is there a way to make it a one-liner on the line ax[0].plot(x,y) and it accomplishes the desired task without a for loop? Any inputs are appreciated.

Upvotes: 1

Views: 265

Answers (1)

Patrick FitzGerald
Patrick FitzGerald

Reputation: 3630

I do not see any way of avoiding a for loop when plotting this data with pandas. My initial thought was to reshape the dataframe to make subplots=True work, like this:

dfp = df.pivot(columns='a1').swaplevel(axis=1).sort_index(axis=1)
dfp

df_pivoted

But I do not see how to select the level 1 of the the columns MultiIndex to make something like dfp.plot(x='a4', y='a3', subplots=True) work.

Removing level 0 and then running the plotting function with dfp.droplevel(axis=1, level=0).plot(x='a4', y='a3', subplots=True) raises ValueError: x must be a label or position. And even if this worked, there would still be the issue of linking the correct points together.

The seaborn package was created to conveniently plot this kind of dataset. If you are open to using it here is an example with relplot:

import pandas as pd    # v 1.1.3
import seaborn as sns  # v 0.11.0

d = {'a1': ['l1','l1','l1','l1','l1','l1','l2','l2','l2','l2','l2','l2'],
     'a2': ['a', 'a', 'b','b','c','c','d','d','e','e','f','f'],
     'a3': [15,20,17,19,23,15,22,21,23,23,24,27],
     'a4': [1,2,1,2,1,2,1,2,1,2,1,2]}
df = pd.DataFrame(d)

sns.relplot(data=df, x='a4', y='a3', col='a1', hue ='a2', kind='line', height=4)

relplot

You can customize the colors with the palette argument and adjust the grid layout with col_wrap.

Upvotes: 1

Related Questions