Ian Murphy
Ian Murphy

Reputation: 21

Spark: Using a UDF to create an Array column in a Dataframe

I have a simple function that takes some XML in a field, parses the values, and returns a list:

<data>
   <datas a="1" b="2" c="3">
   <datas a="2" b="3" c="2">
</data>

becomes a nested list [[1,2,3],[2,3,2]]

I've made this a udf, and I'm making this call on my dataframe:

myudf=udf(myparser)
df2=df1.withColumn("newDataColumn",myudf(df1["xmldatafield"]))

this works. Except that newDataColumn is type STRING instead of Array. So I can't use any of the sql Array functions on it to access or work with individual elements.

I've confirmed in python that the function is returning a List type.

Any idea what I'm doing wrong or how I could get this to be an array column type?

Upvotes: 1

Views: 148

Answers (1)

Ian Murphy
Ian Murphy

Reputation: 21

A friend of mine just told me, the solution is passing the datatype to the UDF function. Duh

Upvotes: 1

Related Questions