Here is an example of what I'm trying to accomplish: Case: column1 column2 json_column One Two {'A': '1', 'B': '2', 'C': '3'} Desired output: column1 column2 json_column B One Two {'A': '1', 'B': '2', 'C': '3'} 2 As seen here, json_column has been parsed and a new column 'B' has been created containing the value of key 'B' in the json_column.

jsonapache-sparkpysparkapache-spark-sqlaws-glue

Reputation: 31

Parse JSON column then add new column in the same dataframe with parsed value

Here is an example of what I'm trying to accomplish:

Case:

column1	column2	json_column
One	Two	{'A': '1', 'B': '2', 'C': '3'}

Desired output:

column1	column2	json_column	B
One	Two	{'A': '1', 'B': '2', 'C': '3'}	2

As seen here, json_column has been parsed and a new column 'B' has been created containing the value of key 'B' in the json_column.

Upvotes: 1

Answers (1)

ZygD

Reputation: 24448

If the column is of string type, you could use from_json:

F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B']

Full example:

from pyspark.sql import functions as F
df = spark.createDataFrame([('One', 'Two', "{'A': '1', 'B': '2', 'C': '3'}")], ['column1', 'column2', 'json_column'])

df = df.withColumn('B', F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B'])

df.show(truncate=0)
# +-------+-------+------------------------------+---+
# |column1|column2|json_column                   |B  |
# +-------+-------+------------------------------+---+
# |One    |Two    |{'A': '1', 'B': '2', 'C': '3'}|2  |
# +-------+-------+------------------------------+---+

Upvotes: 2

Parse JSON column then add new column in the same dataframe with parsed value

Answers (1)

Related Questions