Niko
Niko

Reputation: 800

Spark: Merge columns of the same dataframe without creating additional dataframes

I have the following data frame

+--------------------+-------------------+-------------+
|                uid2|               uid1|    timestamp|
+--------------------+-------------------+-------------+
|a                   |b                  |1589505008851|
|c                   |d                  |1589505012502|
|e                   |f                  |1589505016153|
+--------------------+-------------------+-------------+

and I want to create something like that

+--------------------+-------------------+
|                uids|          timestamp|
+--------------------+-------------------+
|a                   |1589505008851      |
|c                   |1589505012502      |
|e                   |1589505016153      |
|b                   |1589505008851      |
|d                   |1589505012502      |
|f                   |1589505016153      |
+--------------------+-------------------+

so I would like to merge the uid1 and uid2 columns into one column. The columns have the exact same length and they are of the same data type. Can I do this without creating an additional dataframe and "unioning" the two? Just by referencing the columns?

Upvotes: 0

Views: 50

Answers (2)

Shrey Jakhmola
Shrey Jakhmola

Reputation: 532

You can do that by:

Generating sample data

val data = Seq(("a", "b","1589505008851"), ("c", "d","1589505008852"), ("e", "f","1589505008854"))

val rdd = spark.sparkContext.parallelize(data)

var df = rdd.toDF("uid2","uid1","timestamp")

Code which will transform the data as per your requirement.

df=df.select($"uid2".as("uids"),$"timestamp").union(df.select($"uid1".as("uids"),$"timestamp"))

Upvotes: 0

Raphael Roth
Raphael Roth

Reputation: 27373

use the explode / array -approach :

df
.select(explode(array($"uid1",$"uid2")).as("uids"),$"timestamp")
.show()

Upvotes: 1

Related Questions