Reputation: 481
My Spark program reads a file that contains gzip compressed string that encoded64. I have to decode and decompress. I used spark unbase64 to decode and generated byte array
bytedf=df.withColumn("unbase",unbase64(col("value")) )
Is there any spark method available in spark that decompresses bytecode?
Upvotes: 6
Views: 3802
Reputation: 41
I have a similar case, in my case, I do this:
from pyspark.sql.functions import col,unbase64,udf
from gzip import decompress
bytedf=df1.withColumn("unbase",unbase64(col("payload")))
decompress_func = lambda x: decompress(x).decode('utf-8')
udf_decompress = udf(decompress_func)
df2 = bytedf.withColumn('unbase_decompress', udf_decompress('unbase'))
Upvotes: 4
Reputation: 481
I wrote a udf
def decompress(ip):
bytecode = base64.b64decode(x)
d = zlib.decompressobj(32 + zlib.MAX_WBITS)
decompressed_data = d.decompress(bytecode )
return(decompressed_data.decode('utf-8'))
decompress = udf(decompress)
decompressedDF = df.withColumn("decompressed_XML",decompress("value"))
Upvotes: 4
Reputation: 5834
Spark example using base64-
import base64
.
.
#decode base 64 string using map operation or you may create udf.
df.map(lambda base64string: base64.b64decode(base64string), <string encoder>)
Read here for detailed python example.
Upvotes: 1