srini
srini

Reputation: 39

pyspark: dataframes write to parquet

I have the following error when running through pyspark script to load a parquet table.I don't have a issue when testing through pyspark shell

Interactive mode works fine:

 df_writer = pyspark.sql.DataFrameWriter(df)
 df_writer.saveAsTable('test', format='parquet', mode='overwrite',path='xyz/test_table.parquet')

Script mode throws an error :

/opt/mapr/spark/spark-2.0.1//bin/spark-submit --jars /opt/mapr/spark/spark-2.0.1/-2.0.1/jars/commons-csv-1.2.jar /home/mapr/scripts/pyspark_load.py
17/02/17 14:57:06 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
  File "/home/mapr/scripts/2_pyspark_load.py", line 23, in <module>
    df_writer = pyspark.sql.DataFrameWriter(df)
NameError: name 'pyspark' is not defined

Upvotes: 2

Views: 12578

Answers (2)

Jeril
Jeril

Reputation: 8551

You can also save your dataframe in a much easier way:

df.write.parquet("xyz/test_table.parquet", mode='overwrite')
# 'df' is your PySpark dataframe

Upvotes: 5

Grr
Grr

Reputation: 16109

The difference between interactive and spark_submit for my scripts is that I have to import pyspark. So for example

import pyspark

df_writer = pyspark.sql.DataFrameWriter(df)
# Rest of Code

Upvotes: 0

Related Questions