Reputation: 100
I've found some similar solutions, but none that accomplish exactly what I want to do. I have a set of key/value pairs that I want to use for string substitution. e.g.
val replacements = Map( "STREET" -> "ST", "STR" -> "ST")
I am reading a table into a dataframe, and I would like modify a column to replace all instances of the key in my map with their values. So in the above map, look at the "street" column and replace all values of "STREET" with "ST" and all values of "STR" with "ST" etc.
I've been looking at some foldLeft implementations, but haven't been able to finagle it into working.
A basic solution would be great, but an optimal solution would be something I could plug into a Column function that someone wrote that I was hoping to update. Specifically a line like this:
val CleanIt: Column = trim(regexp_replace(regexp_replace(regexp_replace(colName," OF "," ")," AT "," ")," AND "," "))
Upvotes: 0
Views: 1795
Reputation: 37852
You can create this helper method that transforms a given column and a map of replacements into a new Column
expression:
def withReplacements(column: Column, replacements: Map[String, String]): Column =
replacements.foldLeft[Column](column) {
case (col, (from, to)) => regexp_replace(col, from, to)
}
Then use it on your street
column with your replacements
map:
val result = df.withColumn("street", withReplacements($"street", replacements))
For example:
df.show()
// +------------+------+
// | street|number|
// +------------+------+
// | Main STREET| 1|
// |Broadway STR| 2|
// | 1st Ave| 3|
// +------------+------+
result.show()
// +-----------+------+
// | street|number|
// +-----------+------+
// | Main ST| 1|
// |Broadway ST| 2|
// | 1st Ave| 3|
// +-----------+------+
NOTE: the keys in the map must be valid regular expressions. That means, for example, that if you want to replace the string "St."
with "ST"
, you should use Map("St\\." -> "ST)
(escaping the dot, which otherwise would be interpreted as regex's "any")
Upvotes: 2