Raef ElSayed
Raef ElSayed

Reputation: 21

Can I update a value in table in remote Mysql DB using Pyspark?

I am reading from remote mysql database using kafka , and pass it to Pyspark streaming to apply sentiment analysis and then update the column in ht e mysql database with the value. and is that the best case scenario or there is an architecture efficient than that

Upvotes: 0

Views: 275

Answers (1)

Napoleon Borntoparty
Napoleon Borntoparty

Reputation: 1972

Your question is a bit vague, but here goes.

Yes, you can write a Spark DataFrame into any jdbc database using df.write.jdbc(). Here's the doc https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameWriter.jdbc

Your code would look like such

url = "jdbc:mysql://localhost/foobar"

properties = {
    "user": "foo",
    "password": "bar"
}

df.write.jdbc(url=url, table="tablename", mode="overwrite", properties=properties)

This gives you four write modes:

append: Append contents of this DataFrame to existing data.

overwrite: Overwrite existing data.

ignore: Silently ignore this operation if data already exists.

error or errorifexists (default case): Throw an exception if data already exists.

Alternatively, since you're using Spark Streaming, why not produce your dataframe changelog as a new topic and sync this table using a tool like Kafka Connect to your mysql database. This leaves you with perhaps a bit more control over how you insert/update/upsert data to your table.

Upvotes: 1

Related Questions