Jupyter Lab instance crashes with 502 error

Question

I am using a JupyterLab virtual notebook instance from GCP Vertex AI Workbench.

I am reading 2 billion rows of data where each row is comprised of 3 columns of 8 bytes each.

I am reading 100 million rows of data at a time and concatenating it to Pandas dataframe.

All of sudden, the notebook becomes unresponsive with 502 error.

I realize that the virtual machine crashed.

Here is the spec to the virtual machine: n1-standard 64 240GB RAM 100 GB drive

One time, I was successful to reach 2 billion rows. But all of sudden, to my dismay, it crashed with that error.

Google doc just mentions to restart the kernel. That is not so easy when it took more than 1 hour to read 2 billion rows of data. This means more than 1 hour of work just got wasted.

What is causing this error? Why the error occurs so inconsistently? Where is the error message for this to crash? Or is this an error related to pandas dataframe? I am creating a dataframe that have 2 billion rows. If pandas cannot handle rows of this magnitude, it should simply cause a run time error, not crashing a virtual machine.

Thanks in advance

Jupyter Lab instance crashes with 502 error

Answers (1)

Related Questions