Reputation: 557
could you please help me to replace column values in dataframes spark:
data = [["1", "xxx", "company 0"],
["2", "xxx", "company 1"],
["3", "company 44", "company 2"],
["4", "xxx", "company 1"],
["5", "bobby", "company 1"]]
dataframe = spark.createDataFrame(data)
I am trying to replace "company" with "cmp". "Company" can be met in different columns.
Upvotes: 0
Views: 1927
Reputation: 503
functional programming approach
from functools import reduce
from pyspark.sql import functions as F
cols = dataframe.columns
reduce(lambda dataframe, c: dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp')), cols, dataframe).show()
Upvotes: 1
Reputation: 6082
Because the "Company" may appear in any columns, you'd have to loop through each column and apply regex_replace
onto each of them:
from pyspark.sql import functions as F
cols = dataframe.columns
for c in cols:
dataframe = dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp'))
+---+------+-----+
| _1| _2| _3|
+---+------+-----+
| 1| xxx|cmp 0|
| 2| xxx|cmp 1|
| 3|cmp 44|cmp 2|
| 4| xxx|cmp 1|
| 5| bobby|cmp 1|
+---+------+-----+
Upvotes: 1