Reputation: 471
I'm trying to import data with parquet format with custom schema but it returns : TypeError: option() missing 1 required positional argument: 'value'
ProductCustomSchema = StructType([
StructField("id_sku", IntegerType(), True),
StructField("flag_piece", StringType(), True),
StructField("flag_weight", StringType(), True),
StructField("ds_sku", StringType(), True),
StructField("qty_pack", FloatType(), True)])
def read_parquet_(path, schema) :
return spark.read.format("parquet")\
.option(schema)\
.option("timestampFormat", "yyyy/MM/dd HH:mm:ss")\
.load(path)
product_nomenclature = 'C:/Users/alexa/Downloads/product_nomenc'
product_nom = read_parquet_(product_nomenclature, ProductCustomSchema)
Upvotes: 17
Views: 56556
Reputation: 2980
As mentioned in the comments you should change .option(schema)
to .schema(schema)
. option()
requires you to specify a key
(the name of the option you're setting) and a value
(what value you want to assign to that option). You are getting the TypeError
because you were just passing a variable called schema
to option
without specifying what that option you were actually trying to set with that variable.
The QueryExecutionException
you posted in the comments is being raised because the schema you've defined in your schema
variable does not match the data in your DataFrame. If you're going to specify a custom schema you must make sure that schema matches the data you are reading. In your example the column id_sku
is stored as a BinaryType
, but in your schema you're defining the column as an IntegerType
. pyspark
will not try to reconcile differences between the schema you provide and what the actual types are in the data and an exception will be thrown.
To fix your error make sure the schema you're defining correctly represents your data as it is stored in the parquet file (i.e. change the datatype of id_sku
in your schema to be BinaryType
). The benefit to doing this is you get a slight performance gain by not having to infer the file schema each time the parquet file is read.
Upvotes: 11