Reputation: 4978
I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.
item_id, name
1 name1
1 name2
1 name3
2 name4
2 name5
I want to generate results with the names of an item_id
concatenated.
item_id, names
1 name1-name2-name3
2 name4-name5
How do I create such a table with Spark sql?
Upvotes: 2
Views: 157
Reputation: 74619
The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.
The following SQL statement seems doing what you ask for:
SELECT item_id, array_join(collect_list(name), '-') as names
FROM tableName
GROUP BY item_id
In spark-shell
it gives the following result:
scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show
+-------+-----------------+
|item_id| names|
+-------+-----------------+
| 1|name1-name2-name3|
| 2| name4-name5|
+-------+-----------------+
Upvotes: 2
Reputation: 56
You can use Spark data frame's groupBy and agg methods and concat_ws function:
df.groupBy($"item_id").agg(concat_ws("-", collect_list($"name")).alias("names")).show()
Group fields by item_id
and aggregating each name field by concatenating them together.
Upvotes: 0
Reputation: 37473
You can try the below -
df.orderBy('names', ascending=False)
.groupBy('item_id')
.agg(
array_join(
collect_list('names'),
delimiter='-',
).alias('names')
)
Upvotes: 1