Plotting line plot in Python matplot for array with billion entries, run out of RAM

Question

In the below code, I create a big array, and I want to plot it as a line plot. I must use numpy's memory mapped arrays that I learned about here to even create the array (and the x-values). This post Plotting a large number of points using matplotlib and running out of memory has the same issue, but not with a line plot, and I'm afraid I couldn't figure out how to use those ideas to get my line plot to work. (Using the tqdm package to track progress of my loops, it seems that both loops complete, and then while drawing, the RAM explodes.)

I am running this on Google Colab, and everything is fine until drawing the matplot picture, and I don't understand how this could possibly be an issue because ultimately it's just producing some .png file, dot by dot and line by line, which can't be that large/memory intensive! EDIT: instead of plt.show(), if I do plt.savefig(), it also fails.

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

width = 40
height = 8
transition = 0.9
chunk_size = int(1E6)  # Define a chunk size for processing data

n = int(1E8)

def calculate_array(n, tr, filename):
    a = np.memmap(filename, dtype='float64', mode='w+', shape=(n + 1,))
    if tr<1:
        for k in range(n, int(n * tr), -1):
            a[k] = 1 / (1 - tr) * (n - k) / n
    elif tr==1:
        a[n]=1

    for k in tqdm(range(int(n * tr), 0, -1), desc=f"Calculating array for t={tr}"):
        a[k] = 17 # of course in reality it's more complicated but let us cast aside those details
    
    a.flush()  # Ensure changes are written to disk

# Values of t to be used
t_values = [1, 0.99, 0.9, 0.8, 0.7, 0.6, 0.5]

# Create a memory-mapped file for x_values
x_filename = "x_values.dat"
x_values = np.memmap(x_filename, dtype='int64', mode='w+', shape=(n + 1,))
x_values[:] = np.arange(n + 1)
x_values.flush()

# Plot with both log axes
for t in t_values:
    plt.figure(figsize=(width, 8))
    filename = f"array_t_{t}.dat"
    calculate_array(n, t, filename)
    
    # Open the memory-mapped arrays for reading
    a = np.memmap(filename, dtype='float64', mode='r', shape=(n + 1,))
    x_values = np.memmap(x_filename, dtype='int64', mode='r', shape=(n + 1,))
    
    # Plot in chunks
    for start in tqdm(range(1, n, chunk_size), desc=f"Plotting array for t={t}"):
        end = min(start + chunk_size, n)
        x_chunk = x_values[start:end]
        y_chunk = a[start:end]
        
        plt.scatter(x_chunk, y_chunk, s=1, c='blue')
        
        # For the first chunk, connect the last point of the previous chunk
        if start > 1:
            plt.plot(x_values[start-1:end], a[start-1:end], linestyle='-', alpha=0.6, color='blue')
        else:
            plt.plot(x_chunk, y_chunk, linestyle='-', alpha=0.6, color='blue')
    
    plt.xscale('log')
    plt.xlabel('Index (log scale)')
    plt.yscale('symlog')
    plt.ylabel('a(k) (symlog scale)')
    plt.title(f'Individual Plot of the array a with both axes in log scale for t={t}')
    plt.legend([f't={t}'])
    plt.grid(True)
    plt.show()

Plotting line plot in Python matplot for array with billion entries, run out of RAM

Answers (0)

Related Questions