Reputation: 1193
I have a Spark (scala) dataframe where a few of the dataframe columns are optional, that is, sometimes they don't exist. Is there a very simple way to modify my df.select
statement so that spark doesn't care that the column might not exist?
For example, right now I have: df.select(Seq(col("col1"), col("optionalCol"), col("col2")))
.
I was hoping there would be some kind of "optional" designation.
Upvotes: 5
Views: 7976
Reputation: 841
You can get take the columns you might care about, and then filter out the ones that don't exist in the dataframe.
val dfColumns = df.columns.toSet
val columns: Seq[String] = Seq("col1","optionalcol","col2").filter(dfColumns)
df.select(columns)
Upvotes: 5
Reputation: 4333
From Spark docs
You can use the following to check if the column exists or not
// returns true if the column exists else returns false
if(df.columns.contains("optional_col")) {
df.select(Seq(col("col1"), col("optionalCol"), col("col2")))
}
else {
df.select(Seq(col("col1"), col("col2")))
}
Upvotes: 1