Gayatri
Gayatri

Reputation: 2253

PySpark - Saving Hive Table - org.apache.spark.SparkException: Cannot recognize hive type string

I am saving a spark dataframe to a hive table. The spark dataframe is a nested json data structure. I am able to save the dataframe as files but it fails at the point where it creates a hive table on top of it saying org.apache.spark.SparkException: Cannot recognize hive type string

I cannot create a hive table schema first and then insert into it since the data frame consists of a couple hundreds of nested columns.

So I am saving it as:

df.write.partitionBy("dt","file_dt").saveAsTable("df")

I am not able to debug what the issue this.

Upvotes: 4

Views: 5643

Answers (1)

Gayatri
Gayatri

Reputation: 2253

The issue I was having was to do with a few columns which were named as numbers "1","2","3". Removing such columns in the dataframe let me create a hive table without any errors.

Upvotes: 3

Related Questions