spark: row to element

Question

New to Spark.

I'd like to do some transformation on the "wordList" column of a spark DataFrame, df, of the type org.apache.spark.sql.DataFrame = [id: string, wordList: array].

I use dataBricks. df looks like:

+--------------------+--------------------+
|                  id|            wordList|
+--------------------+--------------------+
|08b0a9b6-3b9a-47a...|                 [a]|
|23c2ef79-8dce-4ad...|[ag, adfg, asdfgg...|
|26a7682f-2ce6-4eb...|[ghe, gener, ghee...|
|2ab530b5-04bc-463...|[bap, pemm, pava,...|
+--------------------+--------------------+

More specifically, I have defined a function shrinkList(ol: List[String]): List[String] that takes a list and returns a shorter list, and would like to apply it on the wordList column. The question is, how do I convert the row to a list?

df.select("wordList").map(t => shrinkList(t(1))) give the error: type mismatch; found : Any required: List[String]

Also, I'm not sure about "t(1)" here. I'd rather use the column name instead of the index, in case the order of the columns change in the future. But I can't seem to make t$"wordList" or t.wordList or t("wordList") work. So instead of using t(1), what selector can I use to select the "wordList" column?

user6022341 · Accepted Answer

Try:

df.select("wordList").map(t => shrinkList(t.getSeq[String](0).toList))

or

df.select("wordList").map(t => shrinkList(t.getAs[Seq[String]]("wordList").toList))

spark: row to element

Answers (1)

Related Questions