Reputation: 316
I have an column containing an array of phone numbers represented as structs, and need to put them into three columns by a "type
" attribute (phone1, phone2, fax)
.
Here are two sample values of the column.
[{"number":"000-000-0000","type":"Phone1"},{"number":"000-000-0001","type":"Phone2"},{"number":"000-000-0002","type":"Fax"}]
[{"number":"000-000-1000","type":"Phone1"},{"number":"000-000-1001","typeCode":"Fax"},{"number":"000-000-1002","type":"Phone2"}]
I want to split each into three columns, one for each type. I want something like this:
Phone1 Phone2 Fax
000-000-0000 000-000-0001 000-000-0002
000-000-1000 000-000-1002 000-000-1001
This answer shows how to put each element of the array into its own column. How to explode an array into multiple columns in Spark
This gets me halfway there, but I can't rely on the order of items in the array. If I do this I'll get something like this, where the Phone2 and Fax values in the second column are out of place.
Phone1 Phone2 Fax
000-000-0000 000-000-0001 000-000-0002
000-000-1000 000-000-1001 000-000-1002
How can I split the single column value into three columns, using the type value? An array can have 0-3 numbers, but will never have more than one number of each type.
Upvotes: 1
Views: 785
Reputation: 22449
Here's one way which involves flattening the phone/fax#s via explode
, followed by pivoting on the typeCode
, as shown in the following example:
case class Contact(number: String, typeCode: String)
val df = Seq(
(1, Seq(Contact("111-22-3333", "Phone1"), Contact("111-44-5555", "Phone2"), Contact("111-66-7070", "Fax"))),
(2, Seq(Contact("222-33-4444", "Phone1"), Contact("222-55-6060", "Fax"), Contact("111-77-8888", "Phone2")))
).toDF("user_id", "contacts")
df.
withColumn("contact", explode($"contacts")).
groupBy($"user_id").pivot($"contact.typeCode").agg(first($"contact.number")).
show(false)
// +-------+-----------+-----------+-----------+
// |user_id|Fax |Phone1 |Phone2 |
// +-------+-----------+-----------+-----------+
// |1 |111-66-7070|111-22-3333|111-44-5555|
// |2 |222-55-6060|222-33-4444|111-77-8888|
// +-------+-----------+-----------+-----------+
Upvotes: 1