Reputation: 1655
I am new to Spark and Scala i am stuck on this exception, I am trying to add some extra fields, i.e. StructField to an existing StructType retrieved from Data Frame for a column using Spark SQL and gettting below exception.
code snippet:
val dfStruct:StructType=parquetDf.select("columnname").schema
dfStruct.add("newField","IntegerType",true)
Exception in thread "main"
org.apache.spark.sql.types.DataTypeException: Unsupported dataType: IntegerType. If you have a struct and a field name of it has any special characters, please use backticks (`) to quote that field name, e.g. `x+y`. Please note that backtick itself is not supported in a field name.
at org.apache.spark.sql.types.DataTypeParser$class.toDataType(DataTypeParser.scala:95)
at org.apache.spark.sql.types.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:107)
at org.apache.spark.sql.types.DataTypeParser$.parse(DataTypeParser.scala:111)
I can see there some open issues running on jira related to this exception but not able to understand much. I am using Spark 1.5.1 version
https://issues.apache.org/jira/browse/SPARK-9685
Upvotes: 1
Views: 8187
Reputation: 330093
When you use StructType.add
with a following signature:
add(name: String, dataType: String, nullable: Boolean)
dataType
string should correspond to either .simpleString
or .typeName
. For IntegerType
it is either int
:
import org.apache.spark.sql.types._
IntegerType.simpleString
// String = int
or integer
:
IntegerType.typeName
// String = integer
so what you need is something like this:
val schema = StructType(Nil)
schema.add("foo", "int", true)
// org.apache.spark.sql.types.StructType =
// StructType(StructField(foo,IntegerType,true))
or
schema.add("foo", "integer", true)
// org.apache.spark.sql.types.StructType =
// StructType(StructField(foo,IntegerType,true))
If you want to pass IntegerType
it has to be DataType
not String
:
schema.add("foo", IntegerType, true)
Upvotes: 1