Reputation: 25366
root
|
|-- dogs: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: struct (nullable = true)
| | | |-- name: string (nullable = true)
| | | |-- color: string (nullable = true)
| | | |-- sources: array (nullable = true)
| | | | |-- element: string (containsNull = true)
| | |-- _2: age (nullable = true)
Which shows below with data.select("dogs").show(2,False)
+---------------------------------------------------------------------------------+
|names |
+---------------------------------------------------------------------------------+
|[[[Max,White,WrappedArray(SanDiego)],3], [[Spot,Black,WrappedArray(SanDiego)],2]]|
|[[[Michael,Black,WrappedArray(SanJose)],1]] |
+---------------------------------------------------------------------------------+
only showing top 2 rows
I am wondering if it is possible to access the array elements in each cell? For example, I want to retrieve (Max, white), (Spot, Black) and (Michael, Black) from the dogs column.
In additional, I would like to expand the rows with n elements to n rows if possible.
Thanks!
Upvotes: 2
Views: 3570
Reputation: 11593
You can use explode as below to get access to a dataframe with each row being a record from the array.
data.registerTempTable("data")
dataExplode = sqlContext.sql("select explode(dogs) as dog from data")
dataExplode.show()
Then, you can use select to obtain just the columns you are interested in.
Upvotes: 3