Noobie93
Noobie93

Reputation: 71

How to get a list[String] from a DataFrame

I have a text file in HDFS with a list of ids that I want to read as a list of String. When I do this

spark.read.text(filePath).collect.toList 

I get a List[org.apache.spark.sql.Row] instead. How do I read this file into a list of string?

Upvotes: 0

Views: 3542

Answers (2)

puhlen
puhlen

Reputation: 8529

If you use spark.read.textFile(filepath) instead, you will get a DataSet[String] instead of a DataFrame (aka, DataSet[Row]). Then when you collect you will get an Array[String] instead of Array[Row].

You can also convert a DataFrame with a single string column into a DataSet[String] using df.as[String]. So df.as[String].collect will get an Array[String] from a DataFrame (assuming the DataFrame contains a single string column, otherwise this will fail)

Upvotes: 3

akuiper
akuiper

Reputation: 214987

Use map(_.getString(0)) to extract the value from the Row object:

spark.read.text(filePath).map(_.getString(0)).collect.toList

Upvotes: 3

Related Questions