python multiple plots for numpy array

Question

I have a multi dimensional numpy array of shape (200, 1500). I want to visualise summary statistics for this data. Because the num_cols is too high I can't plot all of them. My questions are:

Which summary statistics shall I visualise?
Do i visualise all columns?
I thought of randomly choosing N columns from the data and showing distribution and box plots. Example shown below is for second column in array X. However, i can't figure out how to show both plots for N columns in a single figure. Can someone help me with this?

dist plot

plt.figure(figsize=(20,4)) plt.subplot(121)
ax = sns.distplot(X[:,1])

Box Plot

plt.subplot(122) plt.xlim(X[:,1].min()*1.1, X[:,1].max()*1.1) sns.boxplot(x=X[:,1])

MPA · Accepted Answer

As @Shiva mentioned, the summary statistics and visualisation approach depends on your problem. The problem formulation determines whether you need mean or median values, standard deviations, eigenvalues, frequency distributions, etc. If you provide more details, the community could offer more specific advice.

Nevertheless, there are general-purpose analytical techniques that you could consider. See e.g. this blog post demonstrating various dimensionality reduction techniques, applied to the MNIST data set. Also check out this blog post discussing the application of an autoencoder for this purpose (scroll down). More specific to visualisation, you could browse through the Seaborn examples gallery to see if there are any examples you could apply to your own dataset.

python multiple plots for numpy array

dist plot

Box Plot

Answers (1)

Related Questions