Veena
Veena

Reputation: 31

Pyspark filter empty lines from RDD not working

I'm relatively new to spark and pyspark

final_plogfiles = plogfiles.filter(lambda x: len(x)>0)

I wrote this code to filter out the empty lines from the RDD plogfiles. It did not remove the empty lines.

I also tried

plogfiles.filter(lambda x: len(x.split())>0)

But if I use plogfiles.filter(lambda x: x.split()), trailing, and leading white spaces in all lines are getting trimmed

I only want to filter out empty lines. I would like to know where I'm going wrong.

Upvotes: 3

Views: 5380

Answers (1)

K246
K246

Reputation: 1107

Is plogfiles an RDD? following works fine for me:

lines = sc.textFile(input_file)
non_empty_lines = lines.filter(lambda x: len(x)>0 )

Upvotes: 1

Related Questions