Paam
Paam

Reputation: 141

Writing spark.sql dataframe result to parquet file

I enabled the following spark.sql session:

# creating Spark context and connection
spark = (SparkSession.builder.appName("appName").enableHiveSupport().getOrCreate())

and am able to produce see the results of the following query:

spark.sql("select year(plt_date) as Year, month(plt_date) as Mounth, count(build) as B_Count, count(product) as P_Count from first_table full outer join second_table on key1=CONCAT('SS',key_2) group by year(plt_date), month(plt_date)").show()

However, when I try to write the resulting dataframe from this query to hdfs, I get the following error:

saving spark.sql.dataframe.DataFrame in hdfs

I am able to save the resulting dataframe of a simple version of this query to the same path. The problem appears by adding functions such as count(), year() and etc.

What is the problem? and how can I save the results to hdfs?

Upvotes: 2

Views: 449

Answers (1)

Ajinkya Bhore
Ajinkya Bhore

Reputation: 174

It is giving error due to '(' present in column 'year(CAST(plt_date AS DATE))' :

Use to rename :

data = data.selectExpr("year(CAST(plt_date AS DATE)) as nameofcolumn")

Upvote if works

Refer : Rename Spark Column

Upvotes: 3

Related Questions