Reputation: 3
I cannot seem to figure out why unbase64 function won't work in my Spark SQL query.
Here is an example. I'm trying to decode "VGhpcyBpcyBhIHRlc3Qh" by calling the unbase64 function within the spark SQL. Any thoughts on why the output doesn't get decoded? Thanks.
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import unbase64
sc = SparkContext("local", "Simple App")
sqlContext = SQLContext(sc)
log = [{"eventTime":"2015-12-14 15:27:00","id":"9ab0135f-b8a3-4312-9065-9f8874fd790c","fullLog":"VGhpcyBpcyBhIHRlc3Qh"}]
df = sqlContext.createDataFrame(log)
df.registerTempTable('data')
query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')
query.write.save("output", format="json")
The output is : {"test":"VGhpcyBpcyBhIHRlc3Qh"}
when I want it to be: {"test":"This is a test!"}
Upvotes: 0
Views: 4789
Reputation: 1676
It seems to work for me...
from pyspark.sql import HiveContext
from pyspark.sql import SQLContext
log = [("2015-12-14 15:27:00","9ab0135f-b8a3-4312-9065-9f8874fd790c","VGhpcyBpcyBhIHRlc3Qh")]
rdd_log = sc.parallelize(log)
df = sqlContext.createDataFrame(rdd_log, ["eventTime", "id", "fullLog"])
df.registerTempTable("data")
query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')
query = query.select(query.test.cast("string").alias('test'))
print query.collect()
>> [Row(test=u'This is a test!')]
Upvotes: 2