pyspark trim and split

Question

I am trying to create an ArrayType from an StringType but I am unable to do a trim and split at the same time.

Create DataFrame

data = [
    (
        "1",
        "Example 1",
        "element1 | element two | element (three) ",        
    ),     
]
schema = StructType(
    [

        StructField("id", StringType(), True),
        StructField("name", StringType(), True),
        StructField("value", StringType(), True),
    ]
)
df = spark.createDataFrame(data, schema)

What I try to do

df = (df.withColumn("value_array", 
                    f.trim(f.split(f.col("value"), "\|")
                                                 )
                                         )
                )

The error that I get

AnalysisException: cannot resolve 'trim(split(`value`, '\|', -1))' due to data type mismatch: argument 1 requires string type, however, 'split(`value`, '\|', -1)' is of array type.;

Code that works but is not complete

df = (df.withColumn("value_arry", 
                    f.split(f.col("value"), "\|")
                    )
                )

The result I get

"[e1| e 2 | e3 ]" #These values contains spaces in 
 
+---+---------+---------------+-------------------+
| id|     name|          value|         value_arry|
+---+---------+---------------+-------------------+
|  1|Example 1|e1| e 2 | e3 3 |[e1,  e 2 ,  e3 3 ]|
|  2|Example 2|       e1 | e2 |        [e1 ,  e2 ]|
+---+---------+---------------+-------------------+

The result I would like to get would have each element without spaces

pyspark trim and split

Create DataFrame

What I try to do

The error that I get

Code that works but is not complete

The result I get

Spark Version 3.2.1

Answers (1)

Related Questions