Qasim Khan
Qasim Khan

Reputation: 83

Unable to plot distribution of a column containing binary values using Python

I'm trying to plot the original data before handling the imbalance in a way to show the class distribution and class imbalance (class is Failure =0/1) 2. I might need to do some transformation on the data in both cases to be able to visualize it.

Here's what the column looks like:

| failure |
|---------|
| 1       |
| 0       |
| 0       |
| 1       |
| 0       |

Here's what I have tried so far:

import numpy as np
from scipy.stats.kde import gaussian_kde

def distribution_scatter(x, symmetric=True, cmap=None, size=None):
    pdf = gaussian_kde(x)    
    w = np.random.rand(len(x))    

    if symmetric:        
        w = w*2-1    
        pseudo_y = pdf(x) * w    

    if cmap:        
        plt.scatter(x, pseudo_y, c=x, cmap=cmap, s=size)    

    else:        
        plt.scatter(x, pseudo_y, s=size)    

    return pseudo_y

Results:

enter image description here

The problem with the results:

I want the plot the distribution of 0's and 1's. For which I believe I need to transform it in someway.

Desired output:

enter image description here

Upvotes: 0

Views: 511

Answers (1)

perl
perl

Reputation: 9941

If you want a KDE plot, you can check kdeplot from seaborn:

x = np.random.binomial(1, 0.2, 100)
sns.kdeplot(x)

Output:

picture


Update: Or a swarmplot if you want a scatter:

x = np.random.binomial(1, 0.2, 25)
sns.swarmplot(x=x)

Output:

picture2


Update 2: In fact, your function seems to also produce a reasonable visualization:

distribution_scatter(np.random.binomial(1, 0.2, 100))

Output:

picture3

Upvotes: 1

Related Questions