activelearner
activelearner

Reputation: 7745

Pyspark - Cast a column in a nested array

I have a dataframe with the following schema:

root
 |-- Id: long (nullable = true)
 |-- LastUpdate: string (nullable = true)
 |-- Info: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Purchase: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- Amount: long (nullable = true)
 |    |    |    |    |-- Name: string (nullable = true)
 |    |    |    |    |-- Type: string (nullable = true)

How can I select the Amount column such that I can cast it?

Tried:

df = df.withColumn("Info.Purchase.Amount", df["Info.Purchase.Amount"].cast(DoubleType()))

But got:

org.apache.spark.sql.AnalysisException: cannot resolve '`Info`.`Purchase`['Amount']'

Upvotes: 0

Views: 1789

Answers (1)

pardeep garg
pardeep garg

Reputation: 219

You can use below method to extract nested array:

df.select(col("info").getField("Purchase").getField("Amount")).show()

This will give you list of all amount column. you can cast that.

Upvotes: 1

Related Questions