Reputation: 21
I have a simple function that takes some XML in a field, parses the values, and returns a list:
<data>
<datas a="1" b="2" c="3">
<datas a="2" b="3" c="2">
</data>
becomes a nested list [[1,2,3],[2,3,2]]
I've made this a udf, and I'm making this call on my dataframe:
myudf=udf(myparser)
df2=df1.withColumn("newDataColumn",myudf(df1["xmldatafield"]))
this works. Except that newDataColumn is type STRING instead of Array. So I can't use any of the sql Array functions on it to access or work with individual elements.
I've confirmed in python that the function is returning a List type.
Any idea what I'm doing wrong or how I could get this to be an array column type?
Upvotes: 1
Views: 148
Reputation: 21
A friend of mine just told me, the solution is passing the datatype to the UDF function. Duh
Upvotes: 1