Reputation: 25770
I have the following Scala Spark code in order to parse the fixed width txt file:
val schemaDf = df.select(
df("value").substr(0, 6).cast("integer").alias("id"),
df("value").substr(7, 6).alias("date"),
df("value").substr(13, 29).alias("string")
)
I'd like to extract the following code:
df("value").substr(0, 6).cast("integer").alias("id"),
df("value").substr(7, 6).alias("date"),
df("value").substr(13, 29).alias("string")
into the dynamic loop in order to be able to define the column parsing in some external configuration, something like this(where x
will hold the config for each column parsing but for now this is simple numbers for demo purpose):
val x = List(1, 2, 3)
val df1 = df.select(
x.foreach {
df("value").substr(0, 6).cast("integer").alias("id")
}
)
but right now the following line df("value").substr(0, 6).cast("integer").alias("id")
don't compile with the following error:
type mismatch; found : org.apache.spark.sql.Column required: Int ⇒ ?
What am I doing wrong and how to properly return the dynamic Column list inside of df.select
method?
Upvotes: 1
Views: 879
Reputation: 5782
The select
won't take a statement as input, but you can save off the Columns
you want to create and then expand the expression as input for the select
:
val x = List(1, 2, 3)
val cols: List[Column] = x.map { i =>
newRecordsDF("value").substr(0, 6).cast("integer").alias("id")
}
val df1 = df.select(cols: _*)
Upvotes: 2