Reputation: 31
I'm relatively new to spark and pyspark
final_plogfiles = plogfiles.filter(lambda x: len(x)>0)
I wrote this code to filter out the empty lines from the RDD plogfiles. It did not remove the empty lines.
I also tried
plogfiles.filter(lambda x: len(x.split())>0)
But if I use plogfiles.filter(lambda x: x.split())
, trailing, and leading white spaces in all lines are getting trimmed
I only want to filter out empty lines. I would like to know where I'm going wrong.
Upvotes: 3
Views: 5380
Reputation: 1107
Is plogfiles an RDD? following works fine for me:
lines = sc.textFile(input_file)
non_empty_lines = lines.filter(lambda x: len(x)>0 )
Upvotes: 1