Reputation: 477
Can any tell me how to convert Spark dataframe into Array[String] in scala.
I have used the following.
x =df.select(columns.head, columns.tail: _*).collect()
The above snippet gives me an Array[Row] and not Array[String]
Upvotes: 11
Views: 54802
Reputation: 833
If you are planning to read the dataset line by line, then you can use the iterator over the dataset:
Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);
for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
String[] item = ((iter.next()).toString().split(",");
}
Upvotes: 1
Reputation: 4010
DataFrame to Array[String]
data.collect.map(_.toSeq).flatten
You can also use the following
data.collect.map(row=>row.getString(0))
If you have more columns then it is good to use the last one
data.rdd.map(row=>row.getString(0)).collect
Upvotes: 5
Reputation: 1397
This should do the trick:
df.select(columns: _*).collect.map(_.toSeq)
Upvotes: 14
Reputation: 477
The answer was provided by a user named cricket_007. You can use the following to convert Array[Row] to Array[String] :
x =df.select(columns.head, columns.tail: _*).collect().map { row => row.toString() }
Thanks, Bharath
Upvotes: 0