Abhishek Tripathi
Abhishek Tripathi

Reputation: 1381

Adding attribute of type Array[long] from existing attribute value in DF

I am using spark 2.0 and have a use case where I need to convert the attribute type of a column from string to Array[long].

Suppose I have a dataframe with schema :

root
 |-- unique_id: string (nullable = true)
 |-- column2 : string (nullable = true)

DF :

+----------+---------+
|unique_id | column2 |
+----------+---------+
|  1       |  123    |
|  2       |  125    |
+----------+---------+

now i want to add a new column with name "column3" of type Array[long]having the values from "column2" like :

root
 |-- unique_id: string (nullable = true)
 |-- column2: long (nullable = true)
 |-- column3: array (nullable = true)
 |    |-- element: long (containsNull = true)

new DF :

+----------+---------+---------+
|unique_id | column2 | column3 |
+----------+---------+---------+
|  1       |  123    | [123]   | 
|  2       |  125    | [125]   |
+----------+---------+---------+

I there a way to achieve this ?

Upvotes: 0

Views: 388

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

You can simply use withColumn and array function as

df.withColumn("column3", array(df("columnd")))

And I also see that you are trying to change the column2 from string to Long. A simple udf function should do the trick. So final solution would be

def changeToLong = udf((str: String) => str.toLong)


val finalDF = df
  .withColumn("column2", changeToLong(col("column2")))
  .withColumn("column3", array(col("column2")))

You need to import functions library too as

import org.apache.spark.sql.functions._

Upvotes: 2

Related Questions