user554481
user554481

Reputation: 2075

Pyspark refer to table created using sql

When I create a table using SQL in Spark, for example:

sql('CREATE TABLE example SELECT a, b FROM c')

How can I pull that table into the python namespace (I can't think of a better term) so that I can update it? Let's say I want to replace NaN values in the table like so:

import pyspark.sql.functions as F
table = sql('SELECT * FROM example')
for column in columns:
    table = table.withColumn(column,F.when(F.isnan(F.col(column)),F.col(column)).otherwise(None))

Does this operation update the original example table created with SQL? If I were to run sql('SELECT * FROM example')show() would I see the updated results? When the original CREATE TABLE example ... SQL runs, is example automatically added to the python namespace?

Upvotes: 0

Views: 111

Answers (1)

botchniaque
botchniaque

Reputation: 5124

The sql function returns a new DataFrame, so the table is not modified. If you want to write a DataFrame's contents into a table created in spark, do it like this:

table.write.mode("append").saveAsTable("example")

But what you are doing is actually changing the schema of a table, in that case

table.createOrReplaceTempView("mytempTable") 
sql("create table example2 as select * from mytempTable");

Upvotes: 1

Related Questions