vkt
vkt

Reputation: 1459

spark change DF schema column rename from dot to underscore

I have a dataframe with columns names that has dot. Example : df.printSchema

user.id_number
user.name.last
user.phone.mobile

etc and I want to rename the schema by replacing the dot with _.

user_id_number
user_name_last
user_phone_mobile

Note: the input data for this DF is JSON format (with nonrelational like NoSQL)

Upvotes: 2

Views: 991

Answers (1)

notNull
notNull

Reputation: 31520

Use either .map,.withColumnRenamed to replace . with _

Example:

val df=Seq(("1","2","3")).toDF("user.id_number","user.name.last","user.phone.mobile")
df.toDF(df.columns.map(x =>x.replace(".","_")):_*).show()

//using replaceAll
df.toDF(df.columns.map(x =>x.replaceAll("\\.","_")):_*).show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//|             1|             2|                3|
//+--------------+--------------+-----------------+

2. Using selectExpr:

val expr=df.columns.map(x =>col(s"`${x}`").alias(s"${x}".replace(".","_")).toString)

df.selectExpr(expr:_*).show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//|             1|             2|                3|
//+--------------+--------------+-----------------+

3.Using .withColumnRenamed:

df.columns.foldLeft(df){(tmpdf,col) =>tmpdf.withColumnRenamed(col,col.replace(".","_"))}.show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//|             1|             2|                3|
//+--------------+--------------+-----------------+

Upvotes: 2

Related Questions