Reputation: 2391
I have a multi dimensional numpy array of shape (200, 1500). I want to visualise summary statistics for this data. Because the num_cols is too high I can't plot all of them. My questions are:
I thought of randomly choosing N columns from the data and showing distribution and box plots. Example shown below is for second column in array X. However, i can't figure out how to show both plots for N columns in a single figure. Can someone help me with this?
plt.figure(figsize=(20,4))
plt.subplot(121)
ax = sns.distplot(X[:,1])
plt.subplot(122) plt.xlim(X[:,1].min()*1.1, X[:,1].max()*1.1) sns.boxplot(x=X[:,1])
Upvotes: 0
Views: 321
Reputation: 2028
As @Shiva
mentioned, the summary statistics and visualisation approach depends on your problem. The problem formulation determines whether you need mean or median values, standard deviations, eigenvalues, frequency distributions, etc. If you provide more details, the community could offer more specific advice.
Nevertheless, there are general-purpose analytical techniques that you could consider. See e.g. this blog post demonstrating various dimensionality reduction techniques, applied to the MNIST data set. Also check out this blog post discussing the application of an autoencoder for this purpose (scroll down). More specific to visualisation, you could browse through the Seaborn examples gallery to see if there are any examples you could apply to your own dataset.
Upvotes: 1