Databricks spark stuck with 0 running while collecting delta table

Question

I am executing this simple code on databricks:

df = spark.read.table(table_name).sample(fraction=0.1)
my_df = df.collect()

I'm accessing an external managed Delta Table on my Unity Catalog, the table has the following properties:

Size
3.1GiB, 13 files

Columns 
13

The cluster I am using is quite big, m4x10large with 160GB or memory, and im also heavily downsampling the data.

The code executes quite quickly, but then it gets stuck forever here:

And the cluster is doing absolutely nothing in the meanhwile. I looked at the logs, and the only thing that caught my attention is:

2024-09-20T11:10:57.351+0000: [GC (Allocation Failure) [PSYoungGen: 36727934K->1398166K(35979264K)] 37066227K->1736467K(119453184K), 13.8056805 secs] [Times: user=68.37 sys=1.95, real=13.80 secs]

That is spammed until eventually the clusters simply dies, I think because of some timeout set from databricks.

Any idea of where to start to debug this? The dataset has been optimized as at first I thought it could have been a problem with partitioning.

Databricks spark stuck with 0 running while collecting delta table

Answers (1)

Related Questions