How to convert RDD list to RDD row in PySpark

Question

rdd = spark.sparkContext.parallelize(['a1', 'a2', 'a3', 'a4', 'a5', ])

# convert to as follows
..., ...
..., ...

# show result
rdd.collect()
[Row(col='a1'), Row(col='a2'), Row(col='a3'), Row(col='a4'), Row(col='a5'), ]

I know in Java Spark we can use Row but not implemented in PySpark.
So what is the most suitable way to implement it? Convert it to dict then convert it rdd.

Lamanus · Accepted Answer

Then import Row package.

rdd = spark.sparkContext.parallelize(['a1', 'a2', 'a3', 'a4', 'a5', ])
from pyspark.sql import Row 

rdd.map(lambda x: Row(x)).collect()

[, , , , ]

How to convert RDD list to RDD row in PySpark

Answers (1)

Related Questions