A Learner
A Learner

Reputation: 177

How to change column name of database to upper case in Spark using Java

I have some column names in mixed cases in my Dataframe like sum(TXN_VOL) I want to convert them to uppercase like SUM(TXN_VOL)

I won't be knowing all the column names so I cant convert them using hard coding.

Either I have to iterate through all column names and convert each of them to UPPER CASE. OR there is any built in functionality to change all column names to UPPER CASE

What I tried is :

String[] columnNames = finalBcDF.columns();
                    Dataset<Row> x = null;
                    for(String columnName : columnNames) {
                    x = finalBcDF.withColumnRenamed(columnName, columnName.toUpperCase());
                }

But this will create new Dataframe each time so, This won't give desired result.

I have checked on many site but I am not able to see how can I do so in Java.

Can anyone help here?

EDIT

In one of the answers :

How to lower the case of column names of a data frame but not its values?

answer is given for Scala and PySpark but I am not able to convert it to Java, can anyone help?

Upvotes: 1

Views: 4349

Answers (2)

abaghel
abaghel

Reputation: 15297

Here is how you can convert the column names to upper case using Java 8.

import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;

df.select(Arrays.asList(df.columns()).stream().map(x -> col(x).as(x.toUpperCase())).toArray(size -> new Column[size])).show(false);

Upvotes: 2

Prakash Bhagat
Prakash Bhagat

Reputation: 1446

Iterating would be good to go approach. Even though new DataFrame java class instance is created. Since spark evaluated lazily so there will be no performance penalty.

Reference: https://data-flair.training/blogs/apache-spark-lazy-evaluation/

Upvotes: 0

Related Questions