pkpto39
pkpto39

Reputation: 545

Color Regions in a Scatter Plot

I recently found out that you can create color regions for scatter plots in Orange. I know Orange sits on top of python, so I figured I'd be able to recreate this, but I'm having a hard time. I haven't figured out how to convert a pandas dataframe for orange. More importantly, I'm working in a spark environment, so if I could go from pyspark to orange that would be better.

I've set up a basic scatter plot in both seaborn and matplotlib to see if I could figure it out.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris dataset from Seaborn
iris = sns.load_dataset("iris")

# Create a scatter plot
sns.scatterplot(x="sepal_length", y="petal_width", hue="species", data=iris)

# Add labels and title
plt.xlabel("Sepal Length")
plt.ylabel("Petal Width")
plt.title("Scatter Plot of Sepal Length vs. Petal Width")

# Show the plot
plt.legend()
plt.show()

enter image description here

Upvotes: 0

Views: 718

Answers (2)

MuhammedYunus
MuhammedYunus

Reputation: 5010

The code below produces a similar-looking plot to the one you posted. It uses matplotlib directly for plotting.

Output:

enter image description here

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib

#
#Load data
#
iris = load_iris(as_frame=True)
iris_x = iris.data
iris_y = iris.target

iris_x.columns = [col.capitalize()[:-5] for col in iris_x.columns]

#
#Choose a color for each class
#

# Choose automatically across all classes
np.random.seed(2)
class_colors = np.random.choice(
    list(matplotlib.colors.CSS4_COLORS),
    size=len(iris_y.unique()),
    replace=False
)

# Alternatively, specify per class:
class_colors = ['tab:red', 'tab:green', 'tab:blue']

print('Class colors are:', class_colors)
display( matplotlib.colors.ListedColormap(class_colors) )

#Create a colormap out of each color
class_cmaps = [
    matplotlib.colors.LinearSegmentedColormap.from_list('Custom', ['w', color])
    for color in class_colors
]
#View the colormap
# for cmap in class_cmaps: display(cmap)

#
#Select features and fit KNN classifier
#
feat0 = 'Petal length'
feat1 = 'Petal width'
iris_x = iris_x[[feat0, feat1]]

n_neighbors = 10
knn = KNeighborsClassifier(n_neighbors=n_neighbors, weights='distance').fit(iris_x.values, iris_y)

#
#Define a feature space and get a prediction over the entire area
#
x_grid, y_grid = np.meshgrid(
    np.linspace(iris_x[feat0].min(), iris_x[feat0].max(), 100),
    np.linspace(iris_x[feat1].min(), iris_x[feat1].max(), 100)
)
grid_flat = np.hstack([x_grid.reshape(-1, 1), y_grid.reshape(-1, 1)])

#At each point in the feature space, get the:
#predicted class and nearest neighbors
classes = knn.predict(grid_flat)
neighbors = knn.kneighbors(grid_flat, return_distance=False)
#For each point, what proportion of neighbors match the predicted class
prop_per_gridpt = [sum(iris_y[row_neighbors] == clas) / n_neighbors
                   for row_neighbors, clas
                   in zip(neighbors, classes)]

#Convert proportions to colours. Each class has a colour.
rgb_per_gridpt = [
    class_cmaps[clas](prop)
    for clas, prop in zip(classes, prop_per_gridpt)
]
rgb_per_gridpt = np.array(rgb_per_gridpt).reshape(x_grid.shape + (4,))

#Plot
f, ax = plt.subplots(figsize=(8, 8))
ax.scatter(iris_x[feat0], iris_x[feat1], c=np.choose(iris_y.values, class_colors), s=60,
           alpha=0.7, linewidth=2)
ax.set_xlabel(feat0)
ax.set_ylabel(feat1)
ax.set_title(f'Scatter plot of {feat0} vs. {feat1}')

ax.imshow(rgb_per_gridpt, extent=ax.axis(), alpha=0.5,
          interpolation='bicubic', origin='lower')

Upvotes: 1

chthonicdaemon
chthonicdaemon

Reputation: 19760

According to the Orange Documentation:

If a categorical variable is selected in the Color section, the score is computed as follows. For each data instance, the method finds 10 nearest neighbors in the projected 2D space, that is, on the combination of attribute pairs. It then checks how many of them have the same color. The total score of the projection is then the average number of same-colored neighbors.

You can get similar results using scikit-learn's k nearest neighbour classifier. There is an example in their docs that uses the iris dataset as well.

I've modified this example to be more similar to the screenshot you shared:

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap

from sklearn import datasets, neighbors
from sklearn.inspection import DecisionBoundaryDisplay

n_neighbors = 10

# import iris dataset
iris = datasets.load_iris()

# Select features
features = [2, 3]
X = iris.data[:, features]
y = iris.target

# Create color maps
cmap_light = ListedColormap(["blue", "red", "green"])
cmap_bold = ["blue", "red", "green"]

# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights="distance")
clf.fit(X, y)

# Plot boundaries
_, ax = plt.subplots()
DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    cmap=cmap_light,
    ax=ax,
    response_method="predict",
    plot_method="pcolormesh",
    xlabel=iris.feature_names[features[0]],
    ylabel=iris.feature_names[features[1]],
    shading="auto",
    alpha=0.3,
)

# Plot training points
sns.scatterplot(
    x=X[:, 0],
    y=X[:, 1],
    hue=iris.target_names[y],
    palette=cmap_bold,
    alpha=1.0,
    edgecolor="black",
)

This is the result:

Image of Iris dataset coloured by nearest neighbours

Upvotes: 1

Related Questions