rholdberh
rholdberh

Reputation: 557

Replace a column value in the spark DataFrame

could you please help me to replace column values in dataframes spark:

data = [["1", "xxx", "company 0"],
        ["2", "xxx", "company 1"],
        ["3", "company 44", "company 2"],
        ["4", "xxx", "company 1"],
        ["5", "bobby", "company 1"]]


dataframe = spark.createDataFrame(data)

I am trying to replace "company" with "cmp". "Company" can be met in different columns.

Upvotes: 0

Views: 1927

Answers (2)

vmartin
vmartin

Reputation: 503

functional programming approach

from functools import reduce
from pyspark.sql import functions as F
cols = dataframe.columns
reduce(lambda dataframe, c: dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp')), cols, dataframe).show()

Upvotes: 1

pltc
pltc

Reputation: 6082

Because the "Company" may appear in any columns, you'd have to loop through each column and apply regex_replace onto each of them:

from pyspark.sql import functions as F

cols = dataframe.columns

for c in cols:
    dataframe = dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp'))

+---+------+-----+
| _1|    _2|   _3|
+---+------+-----+
|  1|   xxx|cmp 0|
|  2|   xxx|cmp 1|
|  3|cmp 44|cmp 2|
|  4|   xxx|cmp 1|
|  5| bobby|cmp 1|
+---+------+-----+

Upvotes: 1

Related Questions