Dave
Dave

Reputation: 440

Plotting subsetted Pandas data frame using ipywidgets

I would like to plot subsets of a pandas data frame, using dropdown menus from ipywidgets, and I am getting some strange errors.

import ipywidgets as wg
from IPython.display import display
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Make Data Frame
#
df = pd.DataFrame({
    
    "x": np.arange(32),
    "y": np.arange(32),
    "A": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
    "B": [0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1],
    "C": [0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
})

# Make dropdown menus
#
w1 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='A:',
)
#
w2 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='B:',
)
#
w3 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='C:',
)
#
w4 = wg.Dropdown(
    options=['df'],
    value='df',
    description='DF:',
)

# Define plotting function
#
def myPlot(df, a, b, c):
    print(df)
    print(a)
    print(b)
    print(c)
    x = df["x"][df["A"]==a & df["B"]==b & df["C"]==c]
    y = df["y"][df["A"]==a & df["B"]==b & df["C"]==c]   
    plt.scatter(x,y)
    plt.show()

# Plot with interactive dropdown menus
#
wg.interact(myPlot, df=w4, a=w1, b=w2, c=w3)

The error happens when I try to define x in the plotting function: TypeError: string indices must be integers. I think it has to do with getting the data frame into the plotting function, because the print commands in the plotting function give me the right values of A, B, and C but prints the string df.

What would be the way to get the plots that I want?

Upvotes: 1

Views: 448

Answers (2)

Paul H
Paul H

Reputation: 68116

I wouldn't try to extract variables from the namespace with strings.

I would wrap your interactor and pass the dataframe directly:

import ipywidgets as wg
from IPython.display import display
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# Make Data Frame
#
df = pd.DataFrame({
    "x": np.arange(32),
    "y": np.arange(32),
    "A": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
    "B": [0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1],
    "C": [0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
})

# Make dropdown menus
#
w1 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='A:',
)
#
w2 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='B:',
)
#
w3 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='C:',
)


# Define plotting function
#
def myPlot(df, a, b, c):
    subset = df.loc[df["A"].eq(a) & df["B"].eq(b) & df["C"].eq(c), :]
    fig, ax = plt.subplots()
    ax.scatter("x", "y", data=subset)
    return fig


def interactive_plotter(df):
    df_widget = wg.fixed(df)
    return wg.interact(myPlot, df=df_widget, a=w1, b=w2, c=w3)

fig = interactive_plotter(df)

Another thing is to consider is your chained logic statements:

The order of operations with logic operators isn't really intuitive

This statement:

x = df["x"][df["A"]==a & df["B"]==b & df["C"]==c]

Is evaluated as:

x = df["x"][df["A"] == (a & df["B"]) == (b & df["C"]==c)]

(or something close to that).

What you want, at a bare minimum, is this:

x = df["x"][(df["A"] == a) & (df["B"] == b) & (df["C"] == c)]

But I think it'd be better to use the .loc accessor:

x = df.loc[(df["A"] == a) & (df["B"] == b) & (df["C"] == c), "x"]

If you don't like all those parentheses, you can also use .eq()

x = df.loc[df["A"].eq(a) & df["B"].eq(b) & df["C"].eq(c), "x"]

Upvotes: 1

ac24
ac24

Reputation: 5565

If you do want to switch between different dataframes to plot, put them in a dictionary, give each one a unique string key, and then pass that Dictionary as the options to the Dropdown.


import ipywidgets as wg
from IPython.display import display
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Make Data Frame
#
df = pd.DataFrame({
    
    "x": np.arange(32),
    "y": np.arange(32),
    "A": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
    "B": [0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1],
    "C": [0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
})

choices = {
    'df': df
}

# Make dropdown menus
#
w1 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='A:',
)
#
w2 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='B:',
)
#
w3 = wg.Dropdown(
    options=[0,1],
    value=0,
    description='C:',
)
#
w4 = wg.Dropdown(
    options=['df'],
    value='df',
    description='DF:',
)

# Define plotting function
#
def myPlot(df_name, a, b, c):
    df = choices[df_name]
    x = df.loc[(df["A"]==a) & (df["B"]==b) & (df["C"]==c)]["x"]
    y = df.loc[(df["A"]==a) & (df["B"]==b) & (df["C"]==c)]["y"]  
    plt.scatter(x,y)
    plt.show()

# Plot with interactive dropdown menus
#
wg.interact(myPlot, df_name=w4, a=w1, b=w2, c=w3)

PS. Also your indexing calls for your DataFrame are the wrong order. 1) Filter the rows, then 2) select the column you want.

df.loc[(df["A"]==a) & (df["B"]==b) & (df["C"]==c)]["x"] # correct order

df['x'].loc[(df["A"]==a) & (df["B"]==b) & (df["C"]==c)] # incorrect order

Upvotes: 0

Related Questions