Jaehyeok Kwak
Jaehyeok Kwak

Reputation: 17

How do I replace column after encrypting it by using Spark (PySpark)?

I have a question about replacing personal information to encrypted data using Spark.

Let's say for example, if I have a table like:

std_name phone_number
John 585-1243-2156
Susan 585-4567-2156

I want to change phone_number to encrypted form like:

std_name phone_number
John avawehna'vqqa
Susan vabdsvwegq'qb

I have tried using withColumn with udf, but it does not work well. Can someone help me out?

Upvotes: 0

Views: 407

Answers (1)

ZygD
ZygD

Reputation: 24478

You haven't provided your encryption function, but I will assume that there was something simple wrong. If you create a UDF, it will be separately run for every row, so you can use Python inside your UDF.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('John', '585-1243-2156'),
     ('Susan', '585-4567-2156')],
    ['std_name', 'phone_number']
)

@F.udf
def encrypting(data):
    # Encrypting logic:
    encrypted_data = 'xyz' + data[::-1].replace('-', 'w')
    return encrypted_data

df = df.withColumn('phone_number', encrypting('phone_number'))

df.show()
# +--------+----------------+
# |std_name|    phone_number|
# +--------+----------------+
# |    John|xyz6512w3421w585|
# |   Susan|xyz6512w7654w585|
# +--------+----------------+

Upvotes: 1

Related Questions