Reputation: 25406
I have the following working code:
def replaceNone(row):
myList = []
row_len = len(row)
for i in range(0, row_len):
if row[i] is None:
myList.append("")
else:
myList.append(row[i])
return myList
rdd_out = rdd_in.map(lambda row : replaceNone(row))
Here row
is from pyspark.sql import Row
However, it is kind of lengthy and ugly. Is it possible to avoid making the replaceNone
function by writing everything in the lambda process directly? Or at least simplify replaceNone()? Thanks!
Upvotes: 0
Views: 2812
Reputation: 132
I'm not sure what your goal is. It seems like you're jsut trying to replace all the None
values in each row in rdd_in
with empty strings, in which case you can use a list comprehension:
rdd_out = rdd_in.map(lambda row: [r if r is not None else "" for r in row])
The first call to map will make a new list for every element in row
and the list comprehension will replace all None
s with empty strings.
This worked on a trivial example (and defined map since it's not defined for a list):
def map(l, f):
return [f(r) for r in l]
l = [[1,None,2],[3,4,None],[None,5,6]]
l2 = map(l, lambda row: [i if i is not None else "" for i in row])
print(l2)
>>> [[1, '', 2], [3, 4, ''], ['', 5, 6]]
Upvotes: 1