Xahal
Xahal

Reputation: 31

Parse JSON column then add new column in the same dataframe with parsed value

Here is an example of what I'm trying to accomplish:

Case:

column1 column2 json_column
One Two {'A': '1', 'B': '2', 'C': '3'}

Desired output:

column1 column2 json_column B
One Two {'A': '1', 'B': '2', 'C': '3'} 2

As seen here, json_column has been parsed and a new column 'B' has been created containing the value of key 'B' in the json_column.

Upvotes: 1

Views: 992

Answers (1)

ZygD
ZygD

Reputation: 24448

If the column is of string type, you could use from_json:

F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B']

Full example:

from pyspark.sql import functions as F
df = spark.createDataFrame([('One', 'Two', "{'A': '1', 'B': '2', 'C': '3'}")], ['column1', 'column2', 'json_column'])

df = df.withColumn('B', F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B'])

df.show(truncate=0)
# +-------+-------+------------------------------+---+
# |column1|column2|json_column                   |B  |
# +-------+-------+------------------------------+---+
# |One    |Two    |{'A': '1', 'B': '2', 'C': '3'}|2  |
# +-------+-------+------------------------------+---+

Upvotes: 2

Related Questions