Reputation: 6382
I'm plotting a scatter plot with Pandas
. I can understand the plot, except the curves in diagonal plots. Can someone explain to me what they mean?
Image:
Code:
import pylab
import numpy as np
from pandas.tools.plotting import scatter_matrix
import pandas as pd
def make_scatter_plot(X, name):
"""
Make scatterplot.
Parameters:
-----------
X:a design matrix where each column is a feature and each row is an observation.
name: the name of the plot.
"""
pylab.clf()
df = pd.DataFrame(X)
axs = scatter_matrix(df, alpha=0.2, diagonal='kde')
for ax in axs[:,0]: # the left boundary
ax.grid('off', axis='both')
ax.set_yticks([0, .5])
for ax in axs[-1,:]: # the lower boundary
ax.grid('off', axis='both')
ax.set_xticks([0, .5])
pylab.savefig(name + ".png")
Upvotes: 22
Views: 27626
Reputation: 51
Plotting methods allow for a handful of plot styles other than the default Line plot. These methods can be provided as the kind keyword argument to plot(). These include:
https://pandas.pydata.org/pandas-docs/stable/visualization.html
Upvotes: 5
Reputation: 14136
As you can tell, the scatter matrix is plotting each of the columns specified against each other column.
However, in this format, when you got to a diagonal, you would see a plot of a column against itself. Since this would always be a straight line, Pandas decides it can give you more useful information, and plots the density plot of just that column of data.
See http://pandas.pydata.org/pandas-docs/stable/visualization.html#density-plot.
If you would rather have a histogram, you could change your plotting code to:
axs = scatter_matrix(df, alpha=0.2, diagonal='hist')
Upvotes: 27