Reputation: 800
I have the following data frame
+--------------------+-------------------+-------------+
| uid2| uid1| timestamp|
+--------------------+-------------------+-------------+
|a |b |1589505008851|
|c |d |1589505012502|
|e |f |1589505016153|
+--------------------+-------------------+-------------+
and I want to create something like that
+--------------------+-------------------+
| uids| timestamp|
+--------------------+-------------------+
|a |1589505008851 |
|c |1589505012502 |
|e |1589505016153 |
|b |1589505008851 |
|d |1589505012502 |
|f |1589505016153 |
+--------------------+-------------------+
so I would like to merge the uid1
and uid2
columns into one column. The columns have the exact same length and they are of the same data type. Can I do this without creating an additional dataframe and "unioning" the two? Just by referencing the columns?
Upvotes: 0
Views: 50
Reputation: 532
You can do that by:
Generating sample data
val data = Seq(("a", "b","1589505008851"), ("c", "d","1589505008852"), ("e", "f","1589505008854"))
val rdd = spark.sparkContext.parallelize(data)
var df = rdd.toDF("uid2","uid1","timestamp")
Code which will transform the data as per your requirement.
df=df.select($"uid2".as("uids"),$"timestamp").union(df.select($"uid1".as("uids"),$"timestamp"))
Upvotes: 0
Reputation: 27373
use the explode / array
-approach :
df
.select(explode(array($"uid1",$"uid2")).as("uids"),$"timestamp")
.show()
Upvotes: 1