Spark read database rows in chunks?

Question

I am querying a database using spark.read.jdbc method that is quite large and getting the following error:

com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (15,913,800 > 4,194,304)

which indicates the retrieved data is too large.
I don't have the option to alter the database settings and I need to be able to retrieve all of the data so I would like to read the data in chunks and have the result be a dataframe. How can I achieve this?

For example, in python I can query a database using pandas and read it in chunks docs

Alex Ott · Accepted Answer

If you look to the documentation, you can find the fetchsize option that you may pass to the spark.read.jdbc...

Spark read database rows in chunks?

Answers (1)

Related Questions