Reputation: 451
When I read other people's python code, like, spark.read.option("mergeSchema", "true")
, it seems that the coder has already known what the parameters to use. But for a starter, is there a place to look up those available parameters? I look up the Apache documents and it shows parameter undocumented.
Upvotes: 45
Views: 80459
Reputation: 9
More options you will find in the Spark API Documentation of the method csv
of class org.apache.spark.sql.DataFrameReader
. As shown above, the options depend on the input format to be read.
Upvotes: 0
Reputation: 805
Annoyingly, the documentation for the option
method is in the docs for the json
method. The docs on that method say the options are as follows (key -- value -- description):
primitivesAsString -- true/false (default false) -- infers all primitive values as a string type
prefersDecimal -- true/false (default false) -- infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.
allowComments -- true/false (default false) -- ignores Java/C++ style comment in JSON records
allowUnquotedFieldNames -- true/false (default false) -- allows unquoted JSON field names
allowSingleQuotes -- true/false (default true) -- allows single quotes in addition to double quotes
allowNumericLeadingZeros -- true/false (default false) -- allows leading zeros in numbers (e.g. 00012)
allowBackslashEscapingAnyCharacter -- true/false (default false) -- allows accepting quoting of all character using backslash quoting mechanism
allowUnquotedControlChars -- true/false (default false) -- allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.
mode -- PERMISSIVE/DROPMALFORMED/FAILFAST (default PERMISSIVE) -- allows a mode for dealing with corrupt records during parsing.
Upvotes: 18
Reputation: 74
For built-in formats all options are enumerated in the official documentation. Each format has its own set of option, so you have to refer to the one you use.
For read
open docs for DataFrameReader
and expand docs for individual methods. Let's say for JSON format expand json
method (only one variant contains full list of options)
For write open docs for DataFrameWriter
. For example for Parquet:
However merging schema is performed not via options, but using session properties
spark.conf.set("spark.sql.parquet.mergeSchema", "true")
Upvotes: 4