satish_venu
satish_venu

Reputation: 11

Spark Filter function with map

i am a newbiee to spark and am having issues trying to filter a map. i am trying to remove teh header from the .csv file and trying to filer out certain records. but for some reason my filter condition is not working .

val dataWithHeader = sc.textFile("/user/skv/airlines.csv")  
val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim)
val Header = headerAndRows.first    
val data = headerAndRows.filter(_(0) != Header(0))

val maps = data.map( x => Header.zip(x).toMap)       
 //result looks like //res0:     
 // Array[scala.collection.immutable.Map[String,String]] =     
 // Array(Map(Code -> "19031", Description -> "Mackey International Inc.: MAC"),
 //       Map(Code -> "19032", Description -> "Munz Northern Airlines Inc.: XY"), 
 //now when i am trying to filter the map with the below condition the filter is not working ?

val result = maps.filter(x => x("Code") != "19031") 

airlines.csv looks like

 Code,Description
"19031","Mackey International Inc.: MAC"
"19032","Munz Northern Airlines Inc.: XY"
"19033","Cochise Airlines Inc.: COC"   
"19034","Golden Gate Airlines Inc.: GSA"  
"19035","Aeromech Inc.: RZZ" 
"19036","Golden West Airlines Co.: GLW"  
"19037","Puerto Rico Intl Airlines: PRN"  
"19038","Air America Inc.: STZ"  
"19039","Swift Aire Lines Inc.: SWT"

Upvotes: 1

Views: 2884

Answers (2)

Pawan B
Pawan B

Reputation: 4623

Since you have double quote in your data. You can make you work done in two ways :

  1. By removing double quote from data by replacing double quote ( as answered by Raphael Roth )

  2. By comparing your values with double quotes like this

 val result = maps.filter(x => { 
      x("Code") != "\"19031\""
    })

Upvotes: 2

Raphael Roth
Raphael Roth

Reputation: 27383

You seem to have a pair of double quotes too much (because you read double quotes from your csv).

try replacing

val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim)

with

val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim.replace("\"", ""))

Upvotes: 3

Related Questions