atalpha
atalpha

Reputation: 350

Process each comma seperated value in an RDD

I want to process each row in an RDD with comma separated values. What I am trying to achieve is to set all values close to zero to actual zeros. Here is what I did.

   val newRDD = oldRDD
      .map (line => line.split(","))
      .map (line => for(value <- line) {
        if(value.toDouble >= -0.01 && value.toDouble <= 0.01)
            0.toString()
          else
            value
        }
      )

All I am getting is just parenthesis () for all rows. Am I making some stupid mistake?

Thanks.

Upvotes: 1

Views: 317

Answers (1)

T. Gawęda
T. Gawęda

Reputation: 16096

You should add yield keyword, so you will mark that for loop returns list of values:

.map (line => for(value <- line) yield {
        if(value.toDouble >= -0.01 && value.toDouble <= 0.01)
            "0"
          else
            value
        })

You can read it: for every value from line collection, return - yield value that: if // etc

You can also use DataFrame API to load Comma Separated file

Upvotes: 3

Related Questions