phan hien
phan hien

Reputation: 81

Pyspark - saveAsTable - How to Insert new data to existing table?

How to Insert new data to existing table???

I'm trying to insert new data to existing table using pyspark.

This is my program

from pyspark import SparkContext
from pyspark.sql import SQLContext, DataFrameWriter

sc = SparkContext("local[*]", "SPARK-SQL")
sqlContext = SQLContext(sc)

df = sqlContext.read.json("people.json")
df.registerTempTable("people")

# Show old data
result = sqlContext.sql("SELECT * from people")
result.show()

# Create new data
new_data = [{"name": "Phan", "age": 22}]
df_new_data = sqlContext.createDataFrame(new_data)
# Save data to table 'people'
df_new_data.write.mode("append").saveAsTable("people")

# Show new data
result = sqlContext.sql("SELECT * from people")
result.show()

After I run it. Data in table "people" can not be changed.

Old data
+---+--------+
|age|    name|
+---+--------+
| 30| Michael|
| 30|    Andy|
| 19|  Justin|
| 21|PhanHien|
+---+--------+
New data
+---+--------+                                                                  
|age|    name|
+---+--------+
| 30| Michael|
| 30|    Andy|
| 19|  Justin|
| 21|PhanHien|
+---+--------+

Please help me to change data in table!!! Thank you!

Upvotes: 4

Views: 27023

Answers (2)

notNull
notNull

Reputation: 31540

>>> df_new_data.write.mode("append").saveAsTable("people")

The above code writes people table in default database in hive.

So if you want to see the data from hive table you need to create HiveContext then view results from hive table instead of temporary table.

>>> hc=HiveContext(sc)
>>> hc.sql("select * from default.people").show(100,False)

UPDATE:

Append new data to temporary table:

>>> df1=df
>>> df2=df.unionAll(df1)
>>> df2.registerTempTable("people")
>>> sqlContext.sql("select * from people").show(100,False)

Upvotes: 1

phan hien
phan hien

Reputation: 81

I tried to saveAsTable with a table name does not exist.

df_new_data.write.mode("append").saveAsTable("people1")

# Show new data
result = sqlContext.sql("SELECT * from people1")
result.show()

It worked. And I can see new data in table "people1"

+---+----+
|age|name|
+---+----+
|22 |Phan|
+---+----+

Upvotes: 4

Related Questions