pyspark: dataframe header transformation

Question

I am loading a csv into pyspark dataframe. I am trying to remove spaces and more special characters like "(", ")" and "/" from the column headers.

I could remove spaces from the column headers like below.

for col in df.columns:
  df = df.withColumnRenamed(col,col.replace(" ", "").replace("(", "").replace(")", "").replace("/", ""))

But this doesnt work. It removes only spaces in the columns but not the special characters.

I tried as below and it works

for col in df.columns:
  df = df.withColumnRenamed(col,col.replace(" ", "").replace("(", "").replace(")", "").replace("/", ""))

Is there an elegant way of removing? Thanks.

mck · Accepted Answer

Try below code:

to_replace = [" ", "(", ")", "/"]

for col in df.columns:
    col2 = col
    for s in to_replace:
        col2 = col2.replace(s, "")
    df = df.withColumnRenamed(col, col2)

pyspark: dataframe header transformation

Answers (2)

Related Questions