Reputation: 23
I got this error for any way I tried to fix this bug
--------------------------------------------------------------------------- NameError Traceback (most recent call last) File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipywidgets/widgets/interaction.py:243, in interactive.update(self, *args) 241 value = widget.get_interact_value() 242 self.kwargs[widget._kwarg] = value --> 243 self.result = self.f(**self.kwargs) 244 show_inline_matplotlib_plots() 245 if self.auto_display and self.result is not None: File ~/Documents/GitHub/Machine_Learning/Machine_Learning_in_School/Lab8_PCA_ungraded/pca_utils.py:285, in plot_widget.<locals>.update(angle) 283 def update(angle): 284 ang = angle --> 285 with fig.batch_update(): 286 p0r = rotation_matrix(ang)@p0 287 p1r = rotation_matrix(ang)@p1 NameError: cannot access free variable 'fig' where it is not associated with a value in enclosing scope
# PCA - An example on Exploratory Data Analysis
In this notebook you will:
- Replicate Andrew's example on PCA
- Visualize how PCA works on a 2-dimensional small dataset and that not every projection is "good"
- Visualize how a 3-dimensional data can also be contained in a 2-dimensional subspace
- Use PCA to find hidden patterns in a high-dimensional dataset
## Importing the libraries
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from pca_utils import plot_widget
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
import matplotlib.pyplot as plt
import plotly.offline as py
py.init_notebook_mode()
output_notebook()
We are going work on the same example that Andrew has shown in the lecture.
X = np.array([[ 99, -1],
[ 98, -1],
[ 97, -2],
[101, 1],
[102, 1],
[103, 2]])
plt.plot(X[:,0], X[:,1], 'ro')
pca_2 = PCA(n_components=2)
pca_2
# Let's fit the data. We do not need to scale it, since sklearn's implementation already handles it.
pca_2.fit(X)
pca_2.explained_variance_ratio_
The coordinates on the first principal component (first axis) are enough to retain 99.24% of the information ("explained variance"). The second principal component adds an additional 0.76% of the information ("explained variance") that is not stored in the first principal component coordinates.
X_trans_2 = pca_2.transform(X)
X_trans_2
Think of column 1 as the coordinate along the first principal component (the first new axis) and column 2 as the coordinate along the second principal component (the second new axis).
You can probably just choose the first principal component since it retains 99% of the information (explained variance).
pca_1 = PCA(n_components=1)
pca_1
pca_1.fit(X)
pca_1.explained_variance_ratio_
X_trans_1 = pca_1.transform(X)
X_trans_1
Notice how this column is just the first column of X_trans_2
.
If you had 2 features (two columns of data) and choose 2 principal components, then you'll keep all the information and the data will end up the same as the original.
X_reduced_2 = pca_2.inverse_transform(X_trans_2)
X_reduced_2
plt.plot(X_reduced_2[:,0], X_reduced_2[:,1], 'ro')
Reduce to 1 dimension instead of 2
X_reduced_1 = pca_1.inverse_transform(X_trans_1)
X_reduced_1
plt.plot(X_reduced_1[:,0], X_reduced_1[:,1], 'ro')
Notice how the data are now just on a single line (this line is the single principal component that was used to describe the data; and each example had a single "coordinate" along that axis to describe its location.
Let's define $10$ points in the plane and use them as an example to visualize how we can compress this points in 1 dimension. You will see that there are good ways and bad ways.
X = np.array([[-0.83934975, -0.21160323],
[ 0.67508491, 0.25113527],
[-0.05495253, 0.36339613],
[-0.57524042, 0.24450324],
[ 0.58468572, 0.95337657],
[ 0.5663363 , 0.07555096],
[-0.50228538, -0.65749982],
[-0.14075593, 0.02713815],
[ 0.2587186 , -0.26890678],
[ 0.02775847, -0.77709049]])
p = figure(title = '10-point scatterplot', x_axis_label = 'x-axis', y_axis_label = 'y-axis') ## Creates the figure object
p.scatter(X[:,0],X[:,1],marker = 'o', color = '#C00000', size = 5) ## Add the scatter plot
## Some visual adjustments
p.grid.visible = False
p.grid.visible = False
p.outline_line_color = None
p.toolbar.logo = None
p.toolbar_location = None
p.xaxis.axis_line_color = "#f0f0f0"
p.xaxis.axis_line_width = 5
p.yaxis.axis_line_color = "#f0f0f0"
p.yaxis.axis_line_width = 5
show(p)
The next code will generate a widget where you can see how different ways of compressing this data into 1-dimensional datapoints will lead to different ways on how the points are spread in this new space. The line generated by PCA is the line that keeps the points as far as possible from each other.
You can use the slider to rotate the black line through its center and see how the points' projection onto the line will change as we rotate the line.
You can notice that there are projections that place different points in almost the same point, and there are projections that keep the points as separated as they were in the plane.
plot_widget()
In this section we will see how some 3 dimensional data can be condensed into a 2 dimensional space.
from pca_utils import random_point_circle, plot_3d_2d_graphs
X = random_point_circle(n = 150)
deb = plot_3d_2d_graphs(X)
deb.update_layout(yaxis2 = dict(title_text = 'test', visible=True))
Let's load a toy dataset with $500$ samples and $1000$ features.
df = pd.read_csv("toy_dataset.csv")
df.head()
This is a dataset with $1000$ features.
Let's try to see if there is a pattern in the data. The following function will randomly sample 100 pairwise tuples (x,y) of features, so we can scatter-plot them.
def get_pairs(n = 100):
from random import randint
i = 0
tuples = []
while i < 100:
x = df.columns[randint(0,999)]
y = df.columns[randint(0,999)]
while x == y or (x,y) in tuples or (y,x) in tuples:
y = df.columns[randint(0,999)]
tuples.append((x,y))
i+=1
return tuples
pairs = get_pairs()
Now let's plot them!
fig, axs = plt.subplots(10,10, figsize = (35,35))
i = 0
for rows in axs:
for ax in rows:
ax.scatter(df[pairs[i][0]],df[pairs[i][1]], color = "#C00000")
ax.set_xlabel(pairs[i][0])
ax.set_ylabel(pairs[i][1])
i+=1
It looks like there is not much information hidden in pairwise features. Also, it is not possible to check every combination, due to the amount of features. Let's try to see the linear correlation between them.
corr = df.corr()
mask = (abs(corr) > 0.5) & (abs(corr) != 1)
corr.where(mask).stack().sort_values()
The maximum and minimum correlation is around $0.631$ - $0.632$. This does not show too much as well. Let's try PCA decomposition to compress our data into a 2-dimensional subspace (plane) so we can plot it as scatter plot.
pca = PCA(n_components = 2) # Here we choose the number of components that we will keep.
X_pca = pca.fit_transform(df)
df_pca = pd.DataFrame(X_pca, columns = ['principal_component_1','principal_component_2'])
df_pca.head()
plt.scatter(df_pca['principal_component_1'],df_pca['principal_component_2'], color = "#C00000")
plt.xlabel('principal_component_1')
plt.ylabel('principal_component_2')
plt.title('PCA decomposition')
This is great! We can see well defined clusters.
sum(pca.explained_variance_ratio_)
And we preserved only around 14.6% of the variance! Quite impressive! We can clearly see clusters in our data, something that we could not see before. How many clusters can you spot? 8, 10?
If we run a PCA to plot 3 dimensions, we will get more information from data.
pca_3 = PCA(n_components = 3).fit(df)
X_t = pca_3.transform(df)
df_pca_3 = pd.DataFrame(X_t,columns = ['principal_component_1','principal_component_2','principal_component_3'])
import plotly.express as px
fig = px.scatter_3d(df_pca_3, x = 'principal_component_1', y = 'principal_component_2', z = 'principal_component_3').update_traces(marker = dict(color = "#C00000"))
fig.show()
!pip3 install nbformat
sum(pca_3.explained_variance_ratio_)
Now we preserved 19% of the variance and we can clearly see 10 clusters.
Congratulations on finishing this notebook!
I tried it based on Copilot answer: However, I can provide a general example of how to define and use a variable like fig in the correct scope.
Here's an example using Matplotlib to create a figure and plot data:
import matplotlib.pyplot as plt
def create_plot():
# Define the figure in the correct scope
fig, ax = plt.subplots()
# Plot some data
ax.plot([1, 2, 3], [1, 4, 9])
# Set labels and title
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('Sample Plot')
# Show the plot
plt.show()
return fig
# Call the function to create and display the plot
fig = create_plot()
Upvotes: 1
Views: 66
Reputation: 525
The issue is fig
is called on line 285 before it is instantiated on lines 311 & 312.
You can fix this by simply instantiating fig and rhs_fig above the def update(angle):
by adding these two lines (make sure above the nested def update(angle):
function so they exist in the parent function):
fig = go.FigureWidget(data = final_data ).update_yaxes(scaleanchor = 'x', scaleratio= 1, range = [-1,1], visible=False).update_xaxes(range = [-1.5,1.5], visible=False)
rhs_fig = go.FigureWidget(data = rhs_line.data + rhs_scatter.data).update_yaxes(scaleanchor = 'x', scaleratio= 1, range = [-1,1], showgrid=False, visible=False).update_xaxes(range = [-1.5,1.5], showgrid=False, visible=False)
The coursera environment uses an old matplotlib, panda, etc so I suspect this error is ignored by the environment or is less of an issue in older matplotlib versions.
Upvotes: 1
Reputation: 9790
At this time, this isn't an answer as such, but rounds out more of what the original poster should have included.
Plus, this reply points out that the issue doesn't seem to really interfere with observing what the plot is meant to illustrate.
Searching 'Replicate Andrew's example on PCA' in Google, lead me to the notebook here, which looks to be the same as the OP is running.
I then obtained that notebook and uploaded it to a fresh remote session (served by the MyBinder service) started from the launch badge here. (Or you can just click here to get yourself a session such as I describe, plus save yourself the steps of switching to JupyterLab for ease in upload of the notebook.)
(But actually use the notebook I modified here because it gets the linked script file needed and sets up the environment with a couple things the session didn't already have.).
After the session comes up, if you are in the Jupyter Notebook mode your URL will look something like https://hub.binder.curvenote.dev/user/fomightez-clust-analysis-binder-w8zoszhl/notebooks/index.ipynb
. Delete the notebooks/index.ipynb
from the end of the URL in your address bar and replace it with lab
to give the address bar something like https://hub.binder.curvenote.dev/user/fomightez-clust-analysis-binder-w8zoszhl/lab
and hit enter. Okay the leave step and you should have the page reload as JupyterLab and you can easily drag and drop the notebook from your local machine to the running session by clicking on it on your local machine and dragging it into the file browser panel on the left side of the JupyterLab view and releasing the moust. Use my version of the notebook from the gist though, available here, as I added stuff at the top needed to make it run in the MyBinder served session with the environment I had set up for looking into clusters.
The error seen after running plot_widget()
cell under the section 'Visualizing the PCA algorithm' (look for In [19]
in front of the cell in the rendering here) is:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/ipywidgets/widgets/interaction.py:243, in interactive.update(self, *args)
241 value = widget.get_interact_value()
242 self.kwargs[widget._kwarg] = value
--> 243 self.result = self.f(**self.kwargs)
244 show_inline_matplotlib_plots()
245 if self.auto_display and self.result is not None:
File ~/pca_utils.py:285, in plot_widget.<locals>.update(angle)
283 def update(angle):
284 ang = angle
--> 285 with fig.batch_update():
286 p0r = rotation_matrix(ang)@p0
287 p1r = rotation_matrix(ang)@p1
NameError: free variable 'fig' referenced before assignment in enclosing scope
Surprisingly, that error cannot be viewed here in context because it seems to put the error somehow in the widget and nbviewer doesn't show that text. (Nor does GitHub here, although that is not as surprising as the GitHub rendering is more limited than nbviewer in many ways.)
Although this is usually frowned upon, I will add a screenshot to show it context as this is an unusual case:
However, I don't actually see the error being much of a problem. The plot is still interactive because as I slide the 'angle' slider, it updates and changes multiple aspects of the two plots. Hence, it still seems to illustrate what the notebook author wanted. And so I'm not really seeing this be a big problem?
From the error it looks like pca_utils.py
invokes a reference to fig
using the line with fig.batch_update():
, and that maybe it is doing that too early or not in the right way given modern ipywidgets. However, I believe the course organizers would need to deal with it as that file pca_utils.py comes from them and not ipywidgets or the notebook. Despite that, like I said it doesn't seem to interfere with the plot showing what it is meant to show at this time.
Upvotes: 0