Reputation: 31
Here is an example of what I'm trying to accomplish:
Case:
column1 | column2 | json_column |
---|---|---|
One | Two | {'A': '1', 'B': '2', 'C': '3'} |
Desired output:
column1 | column2 | json_column | B |
---|---|---|---|
One | Two | {'A': '1', 'B': '2', 'C': '3'} | 2 |
As seen here, json_column has been parsed and a new column 'B' has been created containing the value of key 'B' in the json_column.
Upvotes: 1
Views: 992
Reputation: 24448
If the column is of string type, you could use from_json
:
F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B']
Full example:
from pyspark.sql import functions as F
df = spark.createDataFrame([('One', 'Two', "{'A': '1', 'B': '2', 'C': '3'}")], ['column1', 'column2', 'json_column'])
df = df.withColumn('B', F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B'])
df.show(truncate=0)
# +-------+-------+------------------------------+---+
# |column1|column2|json_column |B |
# +-------+-------+------------------------------+---+
# |One |Two |{'A': '1', 'B': '2', 'C': '3'}|2 |
# +-------+-------+------------------------------+---+
Upvotes: 2