Hemant Kumar
Hemant Kumar

Reputation: 63

Converting Row into list RDD in pyspark

I have a list in form

[Row(_1=u'5', _2=u'6')]

I want to convert it into

[(5,6)]

using PySpark

Upvotes: 1

Views: 20113

Answers (2)

user9631871
user9631871

Reputation:

Row is a tuple, so all you need is:

rdd.map(tuple)

to get RDD[tuple] or

rdd.map(list)

to get RDD[list].

Upvotes: 3

titiro89
titiro89

Reputation: 2108

If your [Row(_1=u'5', _2=u'6')] is a line in your rdd:

from pyspark.sql import Row

a = [Row(_1=u'5', _2=u'6')]
rdd = sc.parallelize(a) 
print rdd.take(1)
# >>> [Row(_1=u'5', _2=u'6')]

b = rdd.map(lambda line: tuple([int(x) for x in line]))
print b.take(3)
# >>> [(5, 6)]

Upvotes: 5

Related Questions