Puneeth Reddy V
Puneeth Reddy V

Reputation: 1568

Remove spaces from all columns using spark

I have Dataframe with some columns:

+------+-------------+------+---------------+--------------+
|CustId|         Name|Salary|          State|       Country|
+------+-------------+------+---------------+--------------+
|     1|   Brad Eason|   100|New South Wales|     Australia|
|     2|Tracy Hopkins|   200|        England|United Kingdom|
|     3|   Todd Boyes|   300|        England|United Kingdom|
|     4|     Roy Phan|   400|      Minnesota| United States|
|     5|  Harold Ryan|   500|     Washington| United States|
+------+-------------+------+---------------+--------------+

To replace all the space of a string column with _, I have made the following changes:

import org.apache.spark.sql.types.StringType

val trimColumns=customers.schema.fields.filter(_.dataType.isInstanceOf[StringType])
val arrayOfDf = trimColumns.map(f=>{
    customers.withColumn(f.name,regexp_replace(col(f.name), " ", "_"))
})

The above code results in an array of dataframes which have valid data of string column in each element.

scala> arrayOfDf(1).select("Name").show(4)
+-------------+
|         Name|
+-------------+
|   Brad_Eason|
|Tracy_Hopkins|
|   Todd_Boyes|
|     Roy_Phan|
+-------------+

Now I need to pick the first columns from the first element, second columns from the second element of the array, and so on...

Is there any better way for this approach?

Upvotes: 0

Views: 964

Answers (1)

avikm
avikm

Reputation: 802

instead of arrayOfDf logic, use foldleft like below.

val outputDf = trimColumns.foldLeft(df)((agg, tf) => 
  agg.withColumn(tf.name,regexp_replace(col(tf.name), " ", "_"))
)

Output will be:

+------+-------------+------+---------------+--------------+
|CustId|         Name|Salary|          State|       Country|
+------+-------------+------+---------------+--------------+
|     1|   Brad_Eason|   100|New South_Wales|     Australia|
|     2|Tracy_Hopkins|   200|        England|United_Kingdom|
|     3|   Todd_Boyes|   300|        England|United_Kingdom|
|     4|     Roy_Phan|   400|      Minnesota| United_States|
|     5|  Harold_Ryan|   500|     Washington| United_States|
+------+-------------+------+---------------+--------------+

Upvotes: 2

Related Questions