Reputation: 5165
I have a dataframe with one of the column with array type. I wanted to convert array type to string type.
I'm trying to convert using concat_ws(",")
, but it's not getting converted as it is ARRAY<MAP<STRING, STRING>>
type
Dataframe
dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]),
('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])]
df = spark.createDataFrame(data=dataDictionary)
df.withColumn("_2",concat_ws(",",col("_2")))
df.printSchema()
Schema
root
|-- _1: string (nullable = true)
|-- _2: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
Error
AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "concat_ws(,, _2)"
due to data type mismatch: Parameter 2 requires the ("ARRAY<STRING>" or "STRING") type,
however "_2" has the type "ARRAY<MAP<STRING, STRING>>".;
'Project [_1#208, concat_ws(,, _2#209) AS _2#212]
+- LogicalRDD [_1#208, _2#209], false
How can I resolve this?
Upvotes: 0
Views: 42
Reputation: 8886
If I understand correctly, you need to extract the keys and/or values from the map first. Perhaps something like:
from pyspark.sql.functions import concat_ws, col, expr
dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]),
('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])]
df = spark.createDataFrame(dataDictionary)
df.printSchema()
display(df)
df = df.withColumn("_2", concat_ws(",", expr("transform(_2, x -> concat_ws(':', x.key, x.value))")))
df.printSchema()
display(df)
Upvotes: 0