Bowen Peng
Bowen Peng

Reputation: 1825

How to convert RDD list to RDD row in PySpark

rdd = spark.sparkContext.parallelize(['a1', 'a2', 'a3', 'a4', 'a5', ])

# convert to as follows
..., ...
..., ...

# show result
rdd.collect()
[Row(col='a1'), Row(col='a2'), Row(col='a3'), Row(col='a4'), Row(col='a5'), ]

I know in Java Spark we can use Row but not implemented in PySpark.
So what is the most suitable way to implement it? Convert it to dict then convert it rdd.

Upvotes: 0

Views: 352

Answers (1)

Lamanus
Lamanus

Reputation: 13581

Then import Row package.

rdd = spark.sparkContext.parallelize(['a1', 'a2', 'a3', 'a4', 'a5', ])
from pyspark.sql import Row 

rdd.map(lambda x: Row(x)).collect()

[<Row('a1')>, <Row('a2')>, <Row('a3')>, <Row('a4')>, <Row('a5')>]

Upvotes: 1

Related Questions