Reputation: 1381
I am using spark 2.0 and have a use case where I need to convert the attribute type of a column from string to Array[long].
Suppose I have a dataframe with schema :
root
|-- unique_id: string (nullable = true)
|-- column2 : string (nullable = true)
DF :
+----------+---------+
|unique_id | column2 |
+----------+---------+
| 1 | 123 |
| 2 | 125 |
+----------+---------+
now i want to add a new column with name "column3" of type Array[long]having the values from "column2" like :
root
|-- unique_id: string (nullable = true)
|-- column2: long (nullable = true)
|-- column3: array (nullable = true)
| |-- element: long (containsNull = true)
new DF :
+----------+---------+---------+
|unique_id | column2 | column3 |
+----------+---------+---------+
| 1 | 123 | [123] |
| 2 | 125 | [125] |
+----------+---------+---------+
I there a way to achieve this ?
Upvotes: 0
Views: 388
Reputation: 41957
You can simply use withColumn
and array
function as
df.withColumn("column3", array(df("columnd")))
And I also see that you are trying to change the column2
from string
to Long
. A simple udf
function should do the trick. So final solution would be
def changeToLong = udf((str: String) => str.toLong)
val finalDF = df
.withColumn("column2", changeToLong(col("column2")))
.withColumn("column3", array(col("column2")))
You need to import functions library too as
import org.apache.spark.sql.functions._
Upvotes: 2