Bharath
Bharath

Reputation: 477

Convert spark dataframe to Array[String]

Can any tell me how to convert Spark dataframe into Array[String] in scala.

I have used the following.

x =df.select(columns.head, columns.tail: _*).collect()

The above snippet gives me an Array[Row] and not Array[String]

Upvotes: 11

Views: 54802

Answers (4)

Areeha
Areeha

Reputation: 833

If you are planning to read the dataset line by line, then you can use the iterator over the dataset:

 Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);

for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
    String[] item = ((iter.next()).toString().split(",");    
}

Upvotes: 1

loneStar
loneStar

Reputation: 4010

DataFrame to Array[String]

data.collect.map(_.toSeq).flatten

You can also use the following

data.collect.map(row=>row.getString(0)) 

If you have more columns then it is good to use the last one

 data.rdd.map(row=>row.getString(0)).collect

Upvotes: 5

Sohum Sachdev
Sohum Sachdev

Reputation: 1397

This should do the trick:

df.select(columns: _*).collect.map(_.toSeq)

Upvotes: 14

Bharath
Bharath

Reputation: 477

The answer was provided by a user named cricket_007. You can use the following to convert Array[Row] to Array[String] :

x =df.select(columns.head, columns.tail: _*).collect().map { row => row.toString() }

Thanks, Bharath

Upvotes: 0

Related Questions