Reputation: 1433
Using Spark 1.5 and I keep getting index out of range when I invoke .count(), .top(), .take(x)
lateWest = westBound.filter(lambda line: line.split(',')[16] > 0)
print(type(lateWest))<class 'pyspark.rdd.PipelinedRDD'>
lateWest.count()
lateWest.first()
lateWest.take(3)
Any ideas why I am getting this error. I'm guessing it's because lateWest is empty as a result of the first command. But how can I check if it is empty?
Upvotes: 0
Views: 2803
Reputation: 4310
Spark operates using a concept called lazy evaluation. So when you run the first line, the system doesn't actually run your lambda function, it just stores it in a spark object. When you invoke the count()
function, spark runs the lambda function in your filter. And that's where the error actually occurs. So in other words, your error is telling you that you have at least one input line that doesn't have 16 commas.
Upvotes: 1