raul
raul

Reputation: 781

converting rdd list of list unicode values into string

I've a spark rdd which has values in unicode as list of list :

ex_rdd.take(5)
[[u'70450535982412348', u'1000000923', u'1'],
 [u'535982417348', u'1000000923', u'1'],
 [u'50535929459899', u'1000000923', u'99'],
 [u'8070450535936297811', u'1000000923', u'1'],
 [u'5937908667', u'1000000923', u'1']]

To write them into a hdfs file ,it is giving a unicode error.How do i convert them to string and write them in file efficiently in pyspark? hdfs output file should look like below -

 70450535982412348,1000000923,1
 535982417348,1000000923,1 

and so on

Upvotes: 0

Views: 705

Answers (1)

A.M.
A.M.

Reputation: 44

You can use Python's join function for strings, along with the map and saveAsTextFile operations on pyspark.RDD objects (see the documentation here).

ex_rdd.map(lambda L: ','.join(L)).saveAsTextFile('/path/to/hdfs/save/file')

This should be available on even the earlier versions (>= 1.0) of PySpark, if I'm not mistaken.

I'm not sure what you mean by "unicode error". Is this an exception in Python? Or is this an exception in the Java internals?

Upvotes: 1

Related Questions