Reputation: 545
I recently found out that you can create color regions for scatter plots in Orange. I know Orange sits on top of python, so I figured I'd be able to recreate this, but I'm having a hard time. I haven't figured out how to convert a pandas dataframe for orange. More importantly, I'm working in a spark environment, so if I could go from pyspark to orange that would be better.
I've set up a basic scatter plot in both seaborn and matplotlib to see if I could figure it out.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the Iris dataset from Seaborn
iris = sns.load_dataset("iris")
# Create a scatter plot
sns.scatterplot(x="sepal_length", y="petal_width", hue="species", data=iris)
# Add labels and title
plt.xlabel("Sepal Length")
plt.ylabel("Petal Width")
plt.title("Scatter Plot of Sepal Length vs. Petal Width")
# Show the plot
plt.legend()
plt.show()
Upvotes: 0
Views: 718
Reputation: 5010
The code below produces a similar-looking plot to the one you posted. It uses matplotlib
directly for plotting.
Output:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
#
#Load data
#
iris = load_iris(as_frame=True)
iris_x = iris.data
iris_y = iris.target
iris_x.columns = [col.capitalize()[:-5] for col in iris_x.columns]
#
#Choose a color for each class
#
# Choose automatically across all classes
np.random.seed(2)
class_colors = np.random.choice(
list(matplotlib.colors.CSS4_COLORS),
size=len(iris_y.unique()),
replace=False
)
# Alternatively, specify per class:
class_colors = ['tab:red', 'tab:green', 'tab:blue']
print('Class colors are:', class_colors)
display( matplotlib.colors.ListedColormap(class_colors) )
#Create a colormap out of each color
class_cmaps = [
matplotlib.colors.LinearSegmentedColormap.from_list('Custom', ['w', color])
for color in class_colors
]
#View the colormap
# for cmap in class_cmaps: display(cmap)
#
#Select features and fit KNN classifier
#
feat0 = 'Petal length'
feat1 = 'Petal width'
iris_x = iris_x[[feat0, feat1]]
n_neighbors = 10
knn = KNeighborsClassifier(n_neighbors=n_neighbors, weights='distance').fit(iris_x.values, iris_y)
#
#Define a feature space and get a prediction over the entire area
#
x_grid, y_grid = np.meshgrid(
np.linspace(iris_x[feat0].min(), iris_x[feat0].max(), 100),
np.linspace(iris_x[feat1].min(), iris_x[feat1].max(), 100)
)
grid_flat = np.hstack([x_grid.reshape(-1, 1), y_grid.reshape(-1, 1)])
#At each point in the feature space, get the:
#predicted class and nearest neighbors
classes = knn.predict(grid_flat)
neighbors = knn.kneighbors(grid_flat, return_distance=False)
#For each point, what proportion of neighbors match the predicted class
prop_per_gridpt = [sum(iris_y[row_neighbors] == clas) / n_neighbors
for row_neighbors, clas
in zip(neighbors, classes)]
#Convert proportions to colours. Each class has a colour.
rgb_per_gridpt = [
class_cmaps[clas](prop)
for clas, prop in zip(classes, prop_per_gridpt)
]
rgb_per_gridpt = np.array(rgb_per_gridpt).reshape(x_grid.shape + (4,))
#Plot
f, ax = plt.subplots(figsize=(8, 8))
ax.scatter(iris_x[feat0], iris_x[feat1], c=np.choose(iris_y.values, class_colors), s=60,
alpha=0.7, linewidth=2)
ax.set_xlabel(feat0)
ax.set_ylabel(feat1)
ax.set_title(f'Scatter plot of {feat0} vs. {feat1}')
ax.imshow(rgb_per_gridpt, extent=ax.axis(), alpha=0.5,
interpolation='bicubic', origin='lower')
Upvotes: 1
Reputation: 19760
According to the Orange Documentation:
If a categorical variable is selected in the Color section, the score is computed as follows. For each data instance, the method finds 10 nearest neighbors in the projected 2D space, that is, on the combination of attribute pairs. It then checks how many of them have the same color. The total score of the projection is then the average number of same-colored neighbors.
You can get similar results using scikit-learn's k nearest neighbour classifier. There is an example in their docs that uses the iris dataset as well.
I've modified this example to be more similar to the screenshot you shared:
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap
from sklearn import datasets, neighbors
from sklearn.inspection import DecisionBoundaryDisplay
n_neighbors = 10
# import iris dataset
iris = datasets.load_iris()
# Select features
features = [2, 3]
X = iris.data[:, features]
y = iris.target
# Create color maps
cmap_light = ListedColormap(["blue", "red", "green"])
cmap_bold = ["blue", "red", "green"]
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights="distance")
clf.fit(X, y)
# Plot boundaries
_, ax = plt.subplots()
DecisionBoundaryDisplay.from_estimator(
clf,
X,
cmap=cmap_light,
ax=ax,
response_method="predict",
plot_method="pcolormesh",
xlabel=iris.feature_names[features[0]],
ylabel=iris.feature_names[features[1]],
shading="auto",
alpha=0.3,
)
# Plot training points
sns.scatterplot(
x=X[:, 0],
y=X[:, 1],
hue=iris.target_names[y],
palette=cmap_bold,
alpha=1.0,
edgecolor="black",
)
This is the result:
Upvotes: 1