Reputation: 595
I am trying to select specific columns from spark dataframe.
Specific columne list is:
required_cols = ['123ABC.PM','456DEF.PM']
Spark_df is in given format:
'123ABC.PM', '54SWC.PM', '456DEF.PM', '154AS.LB'
23.5 34.5 400.7 100.3
25.4 37.6 401 100
and so on
I have already tried:
spark_df_new = spark_df.select(required_cols)
But I am getting error:
"cannot resolve '`123ABC.PM`' given input columns: [123ABC.PM,54SWC.PM, 456DEF.PM,154AS.LB]
``
Upvotes: 0
Views: 2017
Reputation: 1
You need to use *.
* passes the elements in your list to the selection one by one.
spark_df_new = spark_df.select(*required_cols)
Upvotes: 0
Reputation: 894
Probably as workaround you can try below approach.
Replace the old column names with special characters to new columns and then do a select.
val columns = df.columns
val regex = """[+._,' ]+"""
val replacingColumns = columns.map(regex.r.replaceAllIn(_, "_"))
val resultDF = replacingColumns.zip(columns).foldLeft(df){(tempdf, name) => tempdf.withColumnRenamed(name._2, name._1)}
resultDF.show(false)
or
df
.columns
.foldLeft(df){(newdf, colname) =>
newdf.withColumnRenamed(colname, colname.replace(" ", "_").replace(".", "_"))
}
Source: SO
Upvotes: 0