Read a part of a MySQL table in spark using JDBC connector

Question

I am trying to read a table from MySQL database using a JDBC connector in pyspark. My script to read the table is :

query = "SELECT * FROM C WHERE hitId = 4235441"

readConfig = {
  "driver": driver,
  "url": url,
  "dbtable": tableName,
  "user": user,
  "password": password,
  "query_custom": query
}

saveLocation = mountPoint + "/" + tableName
print(saveLocation)

readDF = spark.read.format("jdbc").options(**readConfig).schema(tableSchema).load()
readDF.write.format("delta").option("mergeSchemas", "True").mode("overwrite").save(saveLocation)

I am trying to read only the particular rows which have a hitId of 4235441.

The issue is, still the whole table is being read instead of rows satisfying the custom query. Anyone can point out what is wrong in my script, or if anyone knows any other method to achieve objective?

I am stuck for quite a time, so any help is highly appreciated.

User12345 · Accepted Answer

In readConfig near dbtable option your are specifying the table_name. Instead try specifying the query like below

query = "SELECT * FROM C WHERE hitId = 4235441"

readConfig = {
  "driver": driver,
  "url": url,
  "dbtable": query,
  "user": user,
  "password": password,
}

Read a part of a MySQL table in spark using JDBC connector

Answers (1)

Related Questions