Reputation: 21
I am reading from remote mysql database using kafka , and pass it to Pyspark streaming to apply sentiment analysis and then update the column in ht e mysql database with the value. and is that the best case scenario or there is an architecture efficient than that
Upvotes: 0
Views: 275
Reputation: 1972
Your question is a bit vague, but here goes.
Yes, you can write a Spark DataFrame into any jdbc
database using df.write.jdbc()
. Here's the doc https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameWriter.jdbc
Your code would look like such
url = "jdbc:mysql://localhost/foobar"
properties = {
"user": "foo",
"password": "bar"
}
df.write.jdbc(url=url, table="tablename", mode="overwrite", properties=properties)
This gives you four write modes:
append: Append contents of this DataFrame to existing data.
overwrite: Overwrite existing data.
ignore: Silently ignore this operation if data already exists.
error or errorifexists (default case): Throw an exception if data already exists.
Alternatively, since you're using Spark Streaming, why not produce your dataframe changelog as a new topic and sync this table using a tool like Kafka Connect to your mysql
database. This leaves you with perhaps a bit more control over how you insert/update/upsert data to your table.
Upvotes: 1