B. Smith
B. Smith

Reputation: 1193

How to select from dataframe when column is optional

I have a Spark (scala) dataframe where a few of the dataframe columns are optional, that is, sometimes they don't exist. Is there a very simple way to modify my df.select statement so that spark doesn't care that the column might not exist?

For example, right now I have: df.select(Seq(col("col1"), col("optionalCol"), col("col2"))).

I was hoping there would be some kind of "optional" designation.

Upvotes: 5

Views: 7976

Answers (2)

Ethan
Ethan

Reputation: 841

You can get take the columns you might care about, and then filter out the ones that don't exist in the dataframe.

val dfColumns = df.columns.toSet
val columns: Seq[String] = Seq("col1","optionalcol","col2").filter(dfColumns)
df.select(columns)

Upvotes: 5

tourist
tourist

Reputation: 4333

From Spark docs

You can use the following to check if the column exists or not

// returns true if the column exists else returns false
if(df.columns.contains("optional_col")) {
         df.select(Seq(col("col1"), col("optionalCol"), col("col2")))
}    
else {
        df.select(Seq(col("col1"),  col("col2")))
}

Upvotes: 1

Related Questions