user132277
user132277

Reputation: 11

PySpark error while reading .xlsx file: 'Failed to convert the JSON string to a field.'

I'm new to PySpark, and tried reading an excel file in PySpark but got the error which I'm unable to comprehend. Please help me in fixing this. I've attached the code and the error.

f_name= 'EDI_matching_with_Nielsen_and_AFS_codes_v6.xlsx'
path= base_path_nielsen + f_name

schema_def= StructType([StructField('Orden', StringType(), 'True'), StructField('OPERATIONAL POINT', StringType(), 'True'), StructField('Store Name', StringType(), 'True'), StructField('Address', StringType(), 'True'), StructField('City', StringType(), 'True'), StructField('Postal Code', StringType(), 'True'), StructField('State', StringType(), 'True'), StructField('Market', StringType(), 'True'), StructField('Society', StringType(), 'True')])

store_info= spark.read.format('com.crealytics.spark.excel').schema(schema_def).option('header','true').load(path)

The error that I get is:

IllegalArgumentException                  Traceback (most recent call last)
<command-1547511391218165> in <module>()
      4 schema_def= StructType([StructField('Orden', StringType(), 'True'), StructField('OPERATIONAL POINT', StringType(), 'True'), StructField('Store Name', StringType(), 'True'), StructField('Address', StringType(), 'True'), StructField('City', StringType(), 'True'), StructField('Postal Code', StringType(), 'True'), StructField('State', StringType(), 'True'), StructField('Market', StringType(), 'True'), StructField('Society', StringType(), 'True')])
      5 
----> 6 store_info= spark.read.format('com.crealytics.spark.excel').schema(schema_def).option('header','true').load(path)

/databricks/spark/python/pyspark/sql/readwriter.py in schema(self, schema)
    105         spark = SparkSession.builder.getOrCreate()
    106         if isinstance(schema, StructType):
--> 107             jschema = spark._jsparkSession.parseDataType(schema.json())
    108             self._jreader = self._jreader.schema(jschema)
    109         elif isinstance(schema, basestring):

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     77                 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
     78             if s.startswith('java.lang.IllegalArgumentException: '):
---> 79                 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
     80             raise
     81     return deco

IllegalArgumentException: 'Failed to convert the JSON string \'{"metadata":{},"name":"Orden","nullable":"True","type":"string"}\' to a field.'

You can view the data sample image attached as well. The data contains null values as well for all fields.

Sample data

Thanks, to the community in advance.

Upvotes: 1

Views: 2324

Answers (1)

Abhy
Abhy

Reputation: 71

I was facing the similar issue where I was passing the nullable values from a dictionary. The issue that I was facing was due to assigning the nullable value as String instead of Boolean(True/False). The issue got resolved after changing the values to Boolean. It should work for you by just removing the single quotes from your boolean value.

Upvotes: 1

Related Questions