Remove spaces from all columns using spark

Question

I have Dataframe with some columns:

+------+-------------+------+---------------+--------------+
|CustId|         Name|Salary|          State|       Country|
+------+-------------+------+---------------+--------------+
|     1|   Brad Eason|   100|New South Wales|     Australia|
|     2|Tracy Hopkins|   200|        England|United Kingdom|
|     3|   Todd Boyes|   300|        England|United Kingdom|
|     4|     Roy Phan|   400|      Minnesota| United States|
|     5|  Harold Ryan|   500|     Washington| United States|
+------+-------------+------+---------------+--------------+

To replace all the space of a string column with _, I have made the following changes:

get all the string type columns to avoid exception while performing String operation.
foreach string type column replace SPACE with _.

import org.apache.spark.sql.types.StringType

val trimColumns=customers.schema.fields.filter(_.dataType.isInstanceOf[StringType])
val arrayOfDf = trimColumns.map(f=>{
    customers.withColumn(f.name,regexp_replace(col(f.name), " ", "_"))
})

The above code results in an array of dataframes which have valid data of string column in each element.

scala> arrayOfDf(1).select("Name").show(4)
+-------------+
|         Name|
+-------------+
|   Brad_Eason|
|Tracy_Hopkins|
|   Todd_Boyes|
|     Roy_Phan|
+-------------+

Now I need to pick the first columns from the first element, second columns from the second element of the array, and so on...

Is there any better way for this approach?

avikm · Accepted Answer

instead of arrayOfDf logic, use foldleft like below.

val outputDf = trimColumns.foldLeft(df)((agg, tf) => 
  agg.withColumn(tf.name,regexp_replace(col(tf.name), " ", "_"))
)

Output will be:

+------+-------------+------+---------------+--------------+
|CustId|         Name|Salary|          State|       Country|
+------+-------------+------+---------------+--------------+
|     1|   Brad_Eason|   100|New South_Wales|     Australia|
|     2|Tracy_Hopkins|   200|        England|United_Kingdom|
|     3|   Todd_Boyes|   300|        England|United_Kingdom|
|     4|     Roy_Phan|   400|      Minnesota| United_States|
|     5|  Harold_Ryan|   500|     Washington| United_States|
+------+-------------+------+---------------+--------------+

Remove spaces from all columns using spark

Answers (1)

Related Questions