steffsen
steffsen

Reputation: 45

Getting NULL values only from get_json_object in PySpark

I have a Spark Dataframe (in Palantir Foundry) with the column "c_temperature". This column contains a JSON string in each row with the following schema:

{"TempCelsiusEndAvg":"24.33","TempCelsiusEndMax":"null","TempCelsiusEndMin":"null","TempCelsiusStartAvg":"22.54","TempCelsiusStartMax":"null","TempCelsiusStartMin":"null","TempEndPlausibility":"T_PLAUSIBLE","TempStartPlausibility":"T_PLAUSIBLE"}

I tried to extract the values (they are sometimes "null" and sometimes wiht values like e.g. "24.33") of the avg temperatures in the new columns "TempCelsiusEndAvg" and "TempCelsiusStartAvg" with the following code:

from pyspark.sql import functions as F
from pyspark.sql.types import StringType

def flat_json(sessions_finished):
    df = sessions_finished
    df = df.withColumn("new_temperature", F.col('c_temperature').cast(StringType())
    df = df.withColumn("TempCelsiusEndAvg", F.get_json_object("c_Temperature", '$.TempCelsiusEndAvg'))
    df = df.withColumn("TempCelsiusStartAvg", F.get_json_object("c_Temperature", '$.TempCelsiusStartAvg'))
    return df

I wanted to get the new columns filled with doubles like:

... +-----------------+-------------------+ ...
... |TempCelsiusEndAvg|TempCelsiusStartAvg| ...
... +-----------------+-------------------+ ...
... |            24.33|              22.54| ...
... +-----------------+-------------------+ ...
... |            29.28|              25.16| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...

The new dataframe contains the columns but they are only filled with null values. Can anyone help me solving this problem?

... +-----------------+-------------------+ ...
... |TempCelsiusEndAvg|TempCelsiusStartAvg| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...

There is also a comment in this thread: [https://stackoverflow.com/questions/46084158/how-can-you-parse-a-string-that-is-json-from-an-existing-temp-table-using-pyspar] that describes my problem, but I have no idea how to use this information.

Upvotes: 3

Views: 1092

Answers (1)

Robert Kossendey
Robert Kossendey

Reputation: 7028

You are don't need to do anything, since the column is already a struct. You can create those columns by accessing them with a .

 df = df.withColumn("TempCelsiusEndAvg", F.col("c_temperature.TempCelsiusEndAvg"))
 df = df.withColumn("TempCelsiusStartAvg", F.col("c_temperature.TempCelsiusStartAvg"))

Upvotes: 1

Related Questions