Reputation: 167
I am trying to convert an RDD
of lists to a Dataframe
in Spark.
RDD:
['ABC', 'AA', 'SSS', 'color-0-value', 'AAAAA_VVVV0-value_1', '1', 'WARNING', 'No test data for negative population! Re-using negative population for non-backtest.']
['ABC', 'SS', 'AA', 'color-0-SS', 'GG0-value_1', '1', 'Temp', 'After, date differences are outside tolerance (10 days) 95.1% of the time']
This is the content of the RDD
, multiple lists.
How to convert this to a dataframe? Currently, it is converting it into a single column, but i need multiple columns.
Dataframe
+--------------+
| _1|
+--------------+
|['ABC', 'AA...|
|['ABC', 'SS...|
Upvotes: 0
Views: 1953
Reputation: 61
Just use Row.fromSeq
:
import org.apache.spark.sql.Row
rdd.map(x => Row.fromSeq(x)).toDF
Upvotes: 6