Row group size is too large error while writing data into the parquet files in databricks

Question

I received the error "Row group size is too large (2111110000)" while writing a DataFrame into Parquet files.

I am assuming this is due to trying to load a table from SQL Server that has very large rows (with a maximum row size of 2.5 GB).

Because of this, I am unable to load the data. Source Table Details: Number of rows: 1000 Total size: 200 GB Issue: One particular column contains a huge JSON string, causing this issue.

Questions: How can I load this data into Parquet? Can Parquet handle such a huge column with a value of 1.5 GB? Does Parquet try to load one column in one row group, or will it load the data in multiple row groups if the column size is huge?

I have tried to load the dataframe into parquet table in spark ,but receive some error which relates to the parquet internals

Row group size is too large error while writing data into the parquet files in databricks

Answers (1)

Related Questions