Reputation: 21
I am trying to read a table from MySQL database using a JDBC connector in pyspark. My script to read the table is :
query = "SELECT * FROM C WHERE hitId = 4235441"
readConfig = {
"driver": driver,
"url": url,
"dbtable": tableName,
"user": user,
"password": password,
"query_custom": query
}
saveLocation = mountPoint + "/" + tableName
print(saveLocation)
readDF = spark.read.format("jdbc").options(**readConfig).schema(tableSchema).load()
readDF.write.format("delta").option("mergeSchemas", "True").mode("overwrite").save(saveLocation)
I am trying to read only the particular rows which have a hitId of 4235441.
The issue is, still the whole table is being read instead of rows satisfying the custom query. Anyone can point out what is wrong in my script, or if anyone knows any other method to achieve objective?
I am stuck for quite a time, so any help is highly appreciated.
Upvotes: 0
Views: 618
Reputation: 5480
In readConfig
near dbtable
option your are specifying the table_name. Instead try specifying the query
like below
query = "SELECT * FROM C WHERE hitId = 4235441"
readConfig = {
"driver": driver,
"url": url,
"dbtable": query,
"user": user,
"password": password,
}
Upvotes: 2