user3170725
user3170725

Reputation: 109

Python Scatter Plot - Overlapping data

I have a scatter plot, but a lot of the time the values can be right in the same spot, I have used colour and alpha to try remedy the situation. However as you can see it's still hard to distinguish what exactly is plotted in some areas.

enter image description here

Is there a more fool-proof way to solve this?

Thanks

Upvotes: 1

Views: 7923

Answers (2)

Quoding
Quoding

Reputation: 305

If you would rather have a deterministic offset, I made this function in order to solve a similar problem (which landed me here for an answer). Note that this function only works for exactly overlapping points. However, you can most likely round off your points and slightly modify this function to accommodate "close enough" points.

Hopefully this helps.

import numpy as np

def dodge_points(points, component_index, offset):
    """Dodge every point by a multiplicative offset (multiplier is based on frequency of appearance)

    Args:
        points (array-like (2D)): Array containing the points
        component_index (int): Index / column on which the offset will be applied 
        offset (float): Offset amount. Effective offset for each point is `index of appearance` * offset

    Returns:
        array-like (2D): Dodged points
    """

    # Extract uniques points so we can map an offset for each
    uniques, inv, counts = np.unique(
        points, return_inverse=True, return_counts=True, axis=0
    )

    for i, num_identical in enumerate(counts):
        # Prepare dodge values
        dodge_values = np.array([offset * i for i in range(num_identical)])
        # Find where the dodge values must be applied, in order
        points_loc = np.where(inv == i)[0]
        #Apply the dodge values
        points[points_loc, component_index] += dodge_values

    return points

Here is an example of before and after.

Before:

Before dodge

After:

After Dodge

This method only works for EXACTLY overlapping points (or if you are willing to round points off in a way that np.unique finds matching points).

Upvotes: 1

Dani G
Dani G

Reputation: 1242

You can jitter the values (add a bit of random noise) so they won't be exactly on the same spot.

import numpy as np
import matplotlib.pyplot as plt


x = np.random.randint(low=1,high=5,size=50)
y = np.random.randint(low=0,high=2,size=50)
jittered_y = y + 0.1 * np.random.rand(len(y)) -0.05
jittered_x = x + 0.1 * np.random.rand(len(x)) -0.05

plt.figure(figsize=(10,5))

plt.subplot(221)
plt.scatter(x,y,s=10,alpha=0.5)
plt.title('No Jitter')

plt.subplot(222)
plt.scatter(x,jittered_y,s=10,alpha=0.5)
plt.title('Y Jittered')

plt.subplot(223)
plt.scatter(jittered_x,y,s=10,alpha=0.5)
plt.title('X Jittered')

plt.subplot(224)
plt.scatter(jittered_x,jittered_y,s=10,alpha=0.5)
plt.title('Y and X Jittered')

plt.tight_layout();

enter image description here

Upvotes: 8

Related Questions