Reputation: 4670
I am very new to scala and spark.
I have read a text file into a dataframe, and successfully split the single column into columns (essentially the file is SPACE delimited csv)
val irisDF:DataFrame = spark.read.csv("src/test/resources/iris-in.txt")
irisDF.show()
val dfnew:DataFrame = irisDF.withColumn("_tmp", split($"_c0", " ")).select(
$"_tmp".getItem(0).as("col1"),
$"_tmp".getItem(1).as("col2"),
$"_tmp".getItem(2).as("col3"),
$"_tmp".getItem(3).as("col4")
).drop("_tmp")
This works.
BUT what if I do not know how many columns there are in the datafile? How do I dynamically generate the columns depending on the number of items generated by the split function?
Upvotes: 1
Views: 1677
Reputation: 215117
You can create a sequence of select expressions, and then apply all of them to select
method with :_*
syntax:
Example Data:
val df = Seq("a b c d", "e f g").toDF("c0")
df.show
+-------+
| c0|
+-------+
|a b c d|
| e f g|
+-------+
If you want five columns from the c0
column, which you need to determine before doing this:
val selectExprs = 0 until 5 map (i => $"temp".getItem(i).as(s"col$i"))
df.withColumn("temp", split($"c0", " ")).select(selectExprs:_*).show
+----+----+----+----+----+
|col0|col1|col2|col3|col4|
+----+----+----+----+----+
| a| b| c| d|null|
| e| f| g|null|null|
+----+----+----+----+----+
Upvotes: 5