Reputation: 155
I'm trying to fetch unique values of a column in a table and print it along side the other columns such as sum, tablename as shown in the query below. The subquery shown below is fetching multiple rows which is valid but in order to put that in a column, I need to flatten it out as a list or text, e.g [1,2,1]. How can I flatten this data out to fit in the same query as shown below. My preferred approach is to use Spark SQL. Please help.
**Subquery**
(select distinct $field from tablename)
**Actual Code**
RDDdf.createOrReplaceTempView(“tablename”)
val x= RDDdf.schema.fieldNames
val dfs = x.map(field => spark.sql(s"select 'RDDdf' as TableName,
'$field'as column, sum($field) as Sum, (select distinct $field from
tablename) from tablename"))
**Expected output**
TableName | column | Sum | UniqueVal
----------+---------+-----+----------
RDDdf | a | 4 | 1,2,1
Upvotes: 2
Views: 726
Reputation: 16076
Use built-in function collect_list:
select collect_list(value) from (select distinct($field) as value from tablename) as values
Be aware, that this will be VERY slow on large datasets
Upvotes: 2