Reputation: 1284
PostgreSQL's ARRAY_TO_STRING()
function allows you to run
SELECT array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*');
and gives you
array_to_string
-----------------
1,2,3,*,5
(1 row)
Can we do the same using Spark SQL?
What I really need is to have a JSON structure to stay as a string. Thanks!
Upvotes: 1
Views: 1034
Reputation: 35249
The closest thing you can get without writing an udf
is concat_ws
:
from pyspark.sql import functions as F
rdd = sc.parallelize(["""{"foo": 1.0, "bar": [1, 2, 3, null, 5]}"""])
spark.read.json(rdd).withColumn("bar", F.concat_ws(",", "bar")).show()
# +-------+---+
# | bar|foo|
# +-------+---+
# |1,2,3,5|1.0|
# +-------+---+
but as you see it ignores nulls. With udf
you can
@F.udf
def array_to_string(xs, sep, nafill):
return sep.join(str(x) if x is not None else str(nafill) for x in xs)
spark.read.json(rdd).withColumn("bar", array_to_string("bar", F.lit(","), F.lit("*"))).show()
# +---------+---+
# | bar|foo|
# +---------+---+
# |1,2,3,*,5|1.0|
# +---------+---+
but if:
What I really need is to have a JSON structure to stay as a string
then don't parse it all. For example if you use JSON reader:
from pyspark.sql.types import *
(spark.read
.schema(StructType([StructField("foo", StringType()), StructField("bar", StringType())]))
.json(rdd)
.show())
# +---+--------------+
# |foo| bar|
# +---+--------------+
# |1.0|[1,2,3,null,5]|
# +---+--------------+
Upvotes: 2