ITo
ITo

Reputation: 55

rename columns in dataframe pyspark adding a string

I have written code in Python using Pandas that adds "VEN_" to the beginning of the column names:

Tablon.columns = "VEN_" + Tablon.columns

And it works fine, but now I'm working with PySpark and it doesn't work. I've tried:

Vaa_total.columns = ['Vaa_' + col for col in Vaa_total.columns]

or

for elemento in Vaa_total.columns:
    elemento = "Vaa_" + elemento

And other things like that but it doesn't work.

I don't want to replace the columns name, I just want to mantain it but adding a string to the beginning.

Upvotes: 5

Views: 4207

Answers (3)

sargupta
sargupta

Reputation: 1043

Standard format of writing it:

renamed_df = df.withColumnRenamed(col_name, "insert_text" + col_name) for col_name in dataframe.columns])

Upvotes: 0

vvg
vvg

Reputation: 6395

I linked similar topic in comment. Here's example adapted from that topic to your task:

dataframe.select([col(col_name).alias('VAA_' + col_name) for col_name in dataframe])

Upvotes: 0

ags29
ags29

Reputation: 2696

Try something like this:

for elemento in Vaa_total.columns:
    Vaa_total =Vaa_total.withColumnRenamed(elemento, "Vaa_" + elemento)

Upvotes: 4

Related Questions