SheCodes
SheCodes

Reputation: 595

How to select list of specific columns (which contain special characters) from pyspark dataframe?

I am trying to select specific columns from spark dataframe.

Specific columne list is:

required_cols = ['123ABC.PM','456DEF.PM']

Spark_df is in given format:

'123ABC.PM', '54SWC.PM', '456DEF.PM', '154AS.LB'
23.5         34.5         400.7        100.3
25.4         37.6         401          100
and so on

I have already tried:

spark_df_new = spark_df.select(required_cols)

But I am getting error:

"cannot resolve '`123ABC.PM`' given input columns: [123ABC.PM,54SWC.PM, 456DEF.PM,154AS.LB]
``

Upvotes: 0

Views: 2017

Answers (3)

Anuj Dutt
Anuj Dutt

Reputation: 1

You need to use *.
* passes the elements in your list to the selection one by one.

spark_df_new = spark_df.select(*required_cols)

Upvotes: 0

data_addict
data_addict

Reputation: 894

Probably as workaround you can try below approach.

Replace the old column names with special characters to new columns and then do a select.

val columns = df.columns

val regex = """[+._,' ]+"""
val replacingColumns = columns.map(regex.r.replaceAllIn(_, "_"))

val resultDF = replacingColumns.zip(columns).foldLeft(df){(tempdf, name) => tempdf.withColumnRenamed(name._2, name._1)}

resultDF.show(false)

or

df
  .columns
  .foldLeft(df){(newdf, colname) =>
    newdf.withColumnRenamed(colname, colname.replace(" ", "_").replace(".", "_"))
  }

Source: SO

Upvotes: 0

Ranga Vure
Ranga Vure

Reputation: 1932

Use back tick ` char

required_cols = [`123ABC.PM`,`456DEF.PM`]

Upvotes: 1

Related Questions