ilovetolearn
ilovetolearn

Reputation: 2060

Spark 2 Python Rename columns and set columns data types

I am using DataFrame to read in HDFS files and extracting the data using regular expressions.

The column names are dynamically generated using an index and the column data type is created as string.

Is it possible for me to re-define the schema of the DataFrame without renaming or casting the columns individually?

My plan is to convert the DataFrame to RDD and convert the RDD back to DataFrame with a schema.

I am not sure if this is a good idea.

Upvotes: 0

Views: 1301

Answers (1)

Neeraj Bhadani
Neeraj Bhadani

Reputation: 3110

If you have few columns in you dataframe say 5 and you would like to rename all of them you can use toDF() function as below.

Old Columns Names : A, B, C, D ,E. New Columns Names : V, W, X, Y, Z

newdf = df.toDF("V", "W", "X", "Y", "Z")

So in newdf you will find the new column names.

If you would like to rename a particular you can use function "withColumnRenamed"

newdf = df.withColumnRenamed("current-_name", "new_name")

Hope it helps.

Upvotes: 1

Related Questions