Rocketq
Rocketq

Reputation: 5781

Extracting value from json from spark table gives SyntaxError error or keyType should be DataType error

Lets suppose I have such dataframe org below:

id |raw
123|{"inn":"123”, "prof": "tkie"}

I need to extract from json from column raw all ids values to new columns, how to do that?

I tried:

org.withColumn('inn', from_json($"raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn')))

got error:

File "", line 1 org.withColumn('inn', from_json($"raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn'))) ^ SyntaxError: invalid syntax

And:

org.withColumn('inn', from_json("raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn')))

error:

AssertionError Traceback (most recent call last) in () ----> 1 org.withColumn('inn', from_json("raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn')))
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/types.py in init(self, keyType, valueType, valueContainsNull) 342 False 343 """ --> 344 assert isinstance(keyType, DataType), "keyType should be DataType" 345 assert isinstance(valueType, DataType), "valueType should be DataType" 346 self.keyType = keyType AssertionError: keyType should be DataType

Upvotes: 0

Views: 414

Answers (2)

blackbishop
blackbishop

Reputation: 32640

You code contains several problems:

  • $"raw" is used with Scala API, in Pyspark use col("raw") or directly as string "raw"
  • When using StringType or any other type, in python you need to add parentheses StringType()
  • getItem is a Column method but your calling it in string ('searchcard'.getItem('inn'))

Here's the same code corrected with complete working example using from_json function:

import pyspark.sql.functions as F
from pyspark.sql.types import StringType, MapType

org = spark.createDataFrame([
    (123, '{"inn":"123", "prof": "tkie"}')
], ["id", "raw"])

org.withColumn(
    'raw',
    F.from_json("raw", MapType(StringType(), StringType()))
).select(
    'id',
    F.col('raw').getItem('inn').alias('inn')
).show()

#+---+---+
#| id|inn|
#+---+---+
#|123|123|
#+---+---+

Upvotes: 1

Rocketq
Rocketq

Reputation: 5781

This method get_json_object did the trick!

org.withColumn('inn',get_json_object(org.raw, '$.inn')).show(1)

Upvotes: 0

Related Questions