Reputation: 11
I'm new to PySpark, and tried reading an excel file in PySpark but got the error which I'm unable to comprehend. Please help me in fixing this. I've attached the code and the error.
f_name= 'EDI_matching_with_Nielsen_and_AFS_codes_v6.xlsx'
path= base_path_nielsen + f_name
schema_def= StructType([StructField('Orden', StringType(), 'True'), StructField('OPERATIONAL POINT', StringType(), 'True'), StructField('Store Name', StringType(), 'True'), StructField('Address', StringType(), 'True'), StructField('City', StringType(), 'True'), StructField('Postal Code', StringType(), 'True'), StructField('State', StringType(), 'True'), StructField('Market', StringType(), 'True'), StructField('Society', StringType(), 'True')])
store_info= spark.read.format('com.crealytics.spark.excel').schema(schema_def).option('header','true').load(path)
The error that I get is:
IllegalArgumentException Traceback (most recent call last)
<command-1547511391218165> in <module>()
4 schema_def= StructType([StructField('Orden', StringType(), 'True'), StructField('OPERATIONAL POINT', StringType(), 'True'), StructField('Store Name', StringType(), 'True'), StructField('Address', StringType(), 'True'), StructField('City', StringType(), 'True'), StructField('Postal Code', StringType(), 'True'), StructField('State', StringType(), 'True'), StructField('Market', StringType(), 'True'), StructField('Society', StringType(), 'True')])
5
----> 6 store_info= spark.read.format('com.crealytics.spark.excel').schema(schema_def).option('header','true').load(path)
/databricks/spark/python/pyspark/sql/readwriter.py in schema(self, schema)
105 spark = SparkSession.builder.getOrCreate()
106 if isinstance(schema, StructType):
--> 107 jschema = spark._jsparkSession.parseDataType(schema.json())
108 self._jreader = self._jreader.schema(jschema)
109 elif isinstance(schema, basestring):
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
78 if s.startswith('java.lang.IllegalArgumentException: '):
---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
80 raise
81 return deco
IllegalArgumentException: 'Failed to convert the JSON string \'{"metadata":{},"name":"Orden","nullable":"True","type":"string"}\' to a field.'
You can view the data sample image attached as well. The data contains null values as well for all fields.
Thanks, to the community in advance.
Upvotes: 1
Views: 2324
Reputation: 71
I was facing the similar issue where I was passing the nullable values from a dictionary. The issue that I was facing was due to assigning the nullable value as String instead of Boolean(True/False). The issue got resolved after changing the values to Boolean. It should work for you by just removing the single quotes from your boolean value.
Upvotes: 1