Reputation: 81
I'm new in pyspark . I write this code in pyspark:
def filterOut2(line):
return [x for x in line if x != 2]
filtered_lists = data.map(filterOut2)
but I get this error:
'list' object has no attribute 'map'
How do I perform a map
operation specifically on my data in PySpark in a way that allows me to filter my data to only those values for which my condition evaluates to true?
Upvotes: 6
Views: 23190
Reputation: 85432
map(filterOut2, data)
works:
>>> data = [[1,2,3,5],[1,2,5,2],[3,5,2,8],[6,3,1,2],[5,3,2,5],[4,1,2,5] ]
... def filterOut2(line):
... return [x for x in line if x != 2]
... list(map(filterOut2, data))
...
[[1, 3, 5], [1, 5], [3, 5, 8], [6, 3, 1], [5, 3, 5], [4, 1, 5]]
map() takes exactly 1 argument (2 given)
Looks like you redefined map
. Try __builtin__.map(filterOut2, data)
.
Or, use a list comprehension:
>>> [filterOut2(line) for line in data]
[[1, 3, 5], [1, 5], [3, 5, 8], [6, 3, 1], [5, 3, 5], [4, 1, 5]]
Upvotes: 6