davidcorigliano
davidcorigliano

Reputation: 3

How to use unbase64 function in pyspark SQL query?

I cannot seem to figure out why unbase64 function won't work in my Spark SQL query.

Here is an example. I'm trying to decode "VGhpcyBpcyBhIHRlc3Qh" by calling the unbase64 function within the spark SQL. Any thoughts on why the output doesn't get decoded? Thanks.

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import unbase64

sc = SparkContext("local", "Simple App")

sqlContext = SQLContext(sc)

log = [{"eventTime":"2015-12-14 15:27:00","id":"9ab0135f-b8a3-4312-9065-9f8874fd790c","fullLog":"VGhpcyBpcyBhIHRlc3Qh"}]

df = sqlContext.createDataFrame(log)

df.registerTempTable('data')

query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')

query.write.save("output", format="json")

The output is : {"test":"VGhpcyBpcyBhIHRlc3Qh"} when I want it to be: {"test":"This is a test!"}

Upvotes: 0

Views: 4789

Answers (1)

user3689574
user3689574

Reputation: 1676

It seems to work for me...

from pyspark.sql import HiveContext
from pyspark.sql import SQLContext

log = [("2015-12-14 15:27:00","9ab0135f-b8a3-4312-9065-9f8874fd790c","VGhpcyBpcyBhIHRlc3Qh")]

rdd_log = sc.parallelize(log)

df = sqlContext.createDataFrame(rdd_log, ["eventTime", "id", "fullLog"])

df.registerTempTable("data")

query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')

query = query.select(query.test.cast("string").alias('test'))

print query.collect()

>> [Row(test=u'This is a test!')]

Upvotes: 2

Related Questions