Reputation: 71
I have a text file in HDFS with a list of ids that I want to read as a list of String. When I do this
spark.read.text(filePath).collect.toList
I get a List[org.apache.spark.sql.Row] instead. How do I read this file into a list of string?
Upvotes: 0
Views: 3542
Reputation: 8529
If you use spark.read.textFile(filepath)
instead, you will get a DataSet[String]
instead of a DataFrame
(aka, DataSet[Row]
). Then when you collect you will get an Array[String]
instead of Array[Row]
.
You can also convert a DataFrame
with a single string column into a DataSet[String]
using df.as[String]
. So df.as[String].collect
will get an Array[String]
from a DataFrame
(assuming the DataFrame
contains a single string column, otherwise this will fail)
Upvotes: 3
Reputation: 214987
Use map(_.getString(0))
to extract the value from the Row object:
spark.read.text(filePath).map(_.getString(0)).collect.toList
Upvotes: 3