SLU
SLU

Reputation: 81

How to merge two or more columns into one?

I have a streaming Dataframe that I want to calculate min and avg over some columns.

Instead of getting separate resulting columns of min and avg after applying the operations, I want to merge the min and average output into a single column.

The dataframe look like this:

+-----+-----+
|  1  |  2  |           
+-----+-----+-
|24   |  55 |
+-----+-----+
|20   |  51 |
+-----+-----+

I thought I'd use a Scala tuple for it, but that does not seem to work:

val res = List("1","2").map(name => (min(col(name)), avg(col(name))).as(s"result($name)"))

All code used:

val res = List("1","2").map(name => (min(col(name)),avg(col(name))).as(s"result($name)"))

val groupedByTimeWindowDF1 = processedDf.groupBy($"xyz", window($"timestamp", "60 seconds"))
  .agg(res.head, res.tail: _*) 

I'm expecting the output after applying the min and avg mathematical opearations to be:

+-----------+-----------+
|  result(1)|  result(2)|           
+-----------+-----------+
|20 ,22     |  51,53    |
+-----------+-----------+

How I should write the expression?

Upvotes: 1

Views: 407

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

Use struct standard function:

struct(colName: String, colNames: String*): Column

struct(cols: Column*): Column

Creates a new struct column that composes multiple input columns.

That gives you the values as well as the names (of the columns).

val res = List("1","2").map(name => 
  struct(min(col(name)), avg(col(name))) as s"result($name)")
  ^^^^^^ HERE

The power of struct can be seen when you want to reference one field in the struct and you can use the name (not index).

q.select("structCol.name")

Upvotes: 2

Shaido
Shaido

Reputation: 28322

What you want to do is to merge the values of multiple columns together in a single column. For this you can use the array function. In this case it would be:

val res = List("1","2").map(name => array(min(col(name)),avg(col(name))).as(s"result($name)"))

Which will give you :

+------------+------------+
|   result(1)|   result(2)|
+------------+------------+
|[20.0, 22.0]|[51.0, 53.0]|
+------------+------------+

Upvotes: 1

Related Questions