Reputation: 17
I have a question about replacing personal information to encrypted data using Spark.
Let's say for example, if I have a table like:
std_name | phone_number |
---|---|
John | 585-1243-2156 |
Susan | 585-4567-2156 |
I want to change phone_number to encrypted form like:
std_name | phone_number |
---|---|
John | avawehna'vqqa |
Susan | vabdsvwegq'qb |
I have tried using withColumn
with udf
, but it does not work well.
Can someone help me out?
Upvotes: 0
Views: 407
Reputation: 24478
You haven't provided your encryption function, but I will assume that there was something simple wrong. If you create a UDF, it will be separately run for every row, so you can use Python inside your UDF.
from pyspark.sql import functions as F
df = spark.createDataFrame(
[('John', '585-1243-2156'),
('Susan', '585-4567-2156')],
['std_name', 'phone_number']
)
@F.udf
def encrypting(data):
# Encrypting logic:
encrypted_data = 'xyz' + data[::-1].replace('-', 'w')
return encrypted_data
df = df.withColumn('phone_number', encrypting('phone_number'))
df.show()
# +--------+----------------+
# |std_name| phone_number|
# +--------+----------------+
# | John|xyz6512w3421w585|
# | Susan|xyz6512w7654w585|
# +--------+----------------+
Upvotes: 1