Reputation: 35
I have a column with a JSON array and I'm trying to create a new column with only a partial amount of the JSON plus some potential transforms on the json data. I'm using the following DataBricks page as a reference.
https://docs.azuredatabricks.net/_static/notebooks/transform-complex-data-types-python.html
ID | js1 |
---|---|
1 | {"a":1, "b":1} |
And I want to return:
ID | js1 | js2 |
---|---|---|
1 | [{"a":1, "b":1}] | [{"a":1}] |
I'm using slightly cut down version of the sudo-method below for brevity.
def my_method(js):
reader = spark.read
reader.schema(schema) #Schema provided
json = reader.json([js]) <-- Error here
return lit(str(json["a"]))
df.withColumn("js2", my_method(col("js1")))
The error I'm getting is Column not iterable
. So how would I be able to transform the contents of the JSON method and return using withColumn a transformed block of JSON
Upvotes: 1
Views: 92
Reputation: 87174
instead of accessing using [name]
you need to use the map_filter function, like this (adjust the list of possible values):
df.select(map_filter(
"js1", lambda k, _: (k == 'a') | (k == 'c')).alias("js2")
)
P.S. You can't use spark.read
from inside of the user-defined function
Upvotes: 1