Reputation: 2659
val columnName=Seq("col1","col2",....."coln");
Is there a way to do dataframe.select operation to get dataframe containing only the column names specified .
I know I can do dataframe.select("col1","col2"...)
but the columnName
is generated at runtime.
I could do dataframe.select()
repeatedly for each column name in a loop.Will it have any performance overheads?. Is there any other simpler way to accomplish this?
Upvotes: 42
Views: 101199
Reputation: 938
You can use (List(F.col("*")) ++ updatedColumns): _*
in select.
val updatedColumns: List[Column] = inputColumnNames.map(x => (F.col(x) * F.col("is_t90d")).alias(x))
val outputSDF = {
inputSDF
.withColumn("is_t90d", F.col("original_date").between(firstAllowedDate, lastAllowedDate).cast(IntegerType))
.select( // select existing and additional columns
(List(F.col("*")) ++ updatedColumns): _*
)
}
Upvotes: 0
Reputation: 367
Alternatively, you can also write like this
val columnName = Seq("col1", "col2")
val DFFiltered = DF.select(columnName.map(DF(_): _*)
Upvotes: -1
Reputation: 1419
Since dataFrame.select()
expects a sequence of columns and we have a sequence of strings, we need to convert our sequence to a List
of col
s and convert that list to the sequence. columnName.map(name => col(name)): _*
gives a sequence of columns from a sequence of strings, and this can be passed as a parameter to select()
:
val columnName = Seq("col1", "col2")
val DFFiltered = DF.select(columnName.map(name => col(name)): _*)
Upvotes: 8
Reputation: 37852
val columnNames = Seq("col1","col2",....."coln")
// using the string column names:
val result = dataframe.select(columnNames.head, columnNames.tail: _*)
// or, equivalently, using Column objects:
val result = dataframe.select(columnNames.map(c => col(c)): _*)
Upvotes: 85