Jack Twain
Jack Twain

Reputation: 6382

Understanding the diagonal in Pandas' scatter matrix plot

I'm plotting a scatter plot with Pandas. I can understand the plot, except the curves in diagonal plots. Can someone explain to me what they mean?

Image:

enter image description here

Code:

import pylab
import numpy as np
from pandas.tools.plotting import scatter_matrix
import pandas as pd

def make_scatter_plot(X, name):    
    """
    Make scatterplot.

    Parameters:
    -----------
    X:a design matrix where each column is a feature and each row is an observation.
    name: the name of the plot.
    """
    pylab.clf()
    df = pd.DataFrame(X)
    axs = scatter_matrix(df, alpha=0.2, diagonal='kde')

    for ax in axs[:,0]: # the left boundary
        ax.grid('off', axis='both')
        ax.set_yticks([0, .5])

    for ax in axs[-1,:]: # the lower boundary
        ax.grid('off', axis='both')
        ax.set_xticks([0, .5])

    pylab.savefig(name + ".png")

Upvotes: 22

Views: 27626

Answers (2)

scarybuh
scarybuh

Reputation: 51

Plotting methods allow for a handful of plot styles other than the default Line plot. These methods can be provided as the kind keyword argument to plot(). These include:

  • ‘bar’ or ‘barh’ for bar plots
  • ‘hist’ for histogram
  • ‘box’ for boxplot
  • ‘kde’ or 'density' for density plots
  • ‘area’ for area plots
  • ‘scatter’ for scatter plots
  • ‘hexbin’ for hexagonal bin plots
  • ‘pie’ for pie plots

https://pandas.pydata.org/pandas-docs/stable/visualization.html

Upvotes: 5

Wilduck
Wilduck

Reputation: 14136

As you can tell, the scatter matrix is plotting each of the columns specified against each other column.

However, in this format, when you got to a diagonal, you would see a plot of a column against itself. Since this would always be a straight line, Pandas decides it can give you more useful information, and plots the density plot of just that column of data.

See http://pandas.pydata.org/pandas-docs/stable/visualization.html#density-plot.

If you would rather have a histogram, you could change your plotting code to:

axs = scatter_matrix(df, alpha=0.2, diagonal='hist')

Upvotes: 27

Related Questions