Efficient usage of memory with large data size

Question

I am performing a plot as below:

for i in range(len(classederror)):
    plt.scatter(xlag, classederror[i, :])
plt.show()

with the sizes of the variables being:

xlag = np.array(2, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250)

xlag.size = (11,)

classederror = 176501 rows x 11 columns

However, I get memory problem and it is due to the large size of classederror.

Is there a pythonic/more efficient way of doing this without having problem with memory?

WHAT I AM TRYING TO DO

As seen in the image below, the x-axis is xlag and the y-axis is classederror

I want to plot each row in classederror for a range of x-axis values and study the distribution of the data and finally i Should obtain something similar to image below.

ImportanceOfBeingErnest · Accepted Answer

It is of course much more efficient to plot a single scatter plot than 176501 scatter plots.

import numpy as np
import matplotlib.pyplot as plt

xlag = np.array([2, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250])
classederror = (np.random.randn(176501, 11)*25)*(0.2+np.sort(np.random.rand(11)))

plt.scatter(np.tile(xlag,len(classederror)), classederror.flatten())

plt.show()

Given the limited information one can draw from such a plot, it may make sense to directly plot 11 lines.

import numpy as np
import matplotlib.pyplot as plt

xlag = np.array([2, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250])
classederror = (np.random.randn(176501, 11)*25)*(0.2+np.sort(np.random.rand(11)))

vals = np.c_[classederror.min(axis=0),classederror.max(axis=0)].T
x= np.c_[xlag,xlag].T
plt.plot(x,vals, color="C0", lw=2)

plt.show()

To obtain information about the density of points, one may use other means, e.g. a violin plot.

plt.violinplot(classederror, xlag, points=50, widths=20,
                  showmeans=True, showextrema=True, showmedians=True)

Efficient usage of memory with large data size

Answers (1)

Related Questions