Color Regions in a Scatter Plot

Question

I recently found out that you can create color regions for scatter plots in Orange. I know Orange sits on top of python, so I figured I'd be able to recreate this, but I'm having a hard time. I haven't figured out how to convert a pandas dataframe for orange. More importantly, I'm working in a spark environment, so if I could go from pyspark to orange that would be better.

I've set up a basic scatter plot in both seaborn and matplotlib to see if I could figure it out.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris dataset from Seaborn
iris = sns.load_dataset("iris")

# Create a scatter plot
sns.scatterplot(x="sepal_length", y="petal_width", hue="species", data=iris)

# Add labels and title
plt.xlabel("Sepal Length")
plt.ylabel("Petal Width")
plt.title("Scatter Plot of Sepal Length vs. Petal Width")

# Show the plot
plt.legend()
plt.show()

chthonicdaemon · Accepted Answer

According to the Orange Documentation:

If a categorical variable is selected in the Color section, the score is computed as follows. For each data instance, the method finds 10 nearest neighbors in the projected 2D space, that is, on the combination of attribute pairs. It then checks how many of them have the same color. The total score of the projection is then the average number of same-colored neighbors.

You can get similar results using scikit-learn's k nearest neighbour classifier. There is an example in their docs that uses the iris dataset as well.

I've modified this example to be more similar to the screenshot you shared:

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap

from sklearn import datasets, neighbors
from sklearn.inspection import DecisionBoundaryDisplay

n_neighbors = 10

# import iris dataset
iris = datasets.load_iris()

# Select features
features = [2, 3]
X = iris.data[:, features]
y = iris.target

# Create color maps
cmap_light = ListedColormap(["blue", "red", "green"])
cmap_bold = ["blue", "red", "green"]

# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights="distance")
clf.fit(X, y)

# Plot boundaries
_, ax = plt.subplots()
DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    cmap=cmap_light,
    ax=ax,
    response_method="predict",
    plot_method="pcolormesh",
    xlabel=iris.feature_names[features[0]],
    ylabel=iris.feature_names[features[1]],
    shading="auto",
    alpha=0.3,
)

# Plot training points
sns.scatterplot(
    x=X[:, 0],
    y=X[:, 1],
    hue=iris.target_names[y],
    palette=cmap_bold,
    alpha=1.0,
    edgecolor="black",
)

This is the result:

Color Regions in a Scatter Plot

Answers (2)

Related Questions