Reputation: 63
I have a list in form
[Row(_1=u'5', _2=u'6')]
I want to convert it into
[(5,6)]
using PySpark
Upvotes: 1
Views: 20113
Reputation:
Row
is a tuple, so all you need is:
rdd.map(tuple)
to get RDD[tuple]
or
rdd.map(list)
to get RDD[list]
.
Upvotes: 3
Reputation: 2108
If your [Row(_1=u'5', _2=u'6')] is a line in your rdd:
from pyspark.sql import Row
a = [Row(_1=u'5', _2=u'6')]
rdd = sc.parallelize(a)
print rdd.take(1)
# >>> [Row(_1=u'5', _2=u'6')]
b = rdd.map(lambda line: tuple([int(x) for x in line]))
print b.take(3)
# >>> [(5, 6)]
Upvotes: 5