Reputation: 781
I've a spark rdd which has values in unicode as list of list :
ex_rdd.take(5)
[[u'70450535982412348', u'1000000923', u'1'],
[u'535982417348', u'1000000923', u'1'],
[u'50535929459899', u'1000000923', u'99'],
[u'8070450535936297811', u'1000000923', u'1'],
[u'5937908667', u'1000000923', u'1']]
To write them into a hdfs file ,it is giving a unicode error
.How do i convert them to string and write them in file efficiently in pyspark? hdfs output file should look like below -
70450535982412348,1000000923,1
535982417348,1000000923,1
and so on
Upvotes: 0
Views: 705
Reputation: 44
You can use Python's join
function for strings, along with the map
and saveAsTextFile
operations on pyspark.RDD objects (see the documentation here).
ex_rdd.map(lambda L: ','.join(L)).saveAsTextFile('/path/to/hdfs/save/file')
This should be available on even the earlier versions (>= 1.0) of PySpark, if I'm not mistaken.
I'm not sure what you mean by "unicode error
". Is this an exception in Python? Or is this an exception in the Java internals?
Upvotes: 1