Reputation: 2697
i tried to filter null values from RDD but failed. Here's my code :
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
val raw_hbaserdd = hBaseRDD.map{
kv => kv._2
}
val Ratings = raw_hbaseRDD.map {
result => val x = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("user")))
val y = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("item")))
val z = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("rating")))
(x,y, z)
}
Ratings.filter ( x => x._1 != null )
Ratings.foreach(println)
when Debugging, null value still appeared after Filter :
(3359,1494,4)
(null,null,null)
(28574,1542,5)
(null,null,null)
(12062,1219,5)
(14068,1459,3)
any Better idea ?
Upvotes: 3
Views: 11343
Reputation: 51
Try the below:
Ratings.filter ( x => x._1 != "")
Similar example here at Filter rdd lines by values in fields Scala
Upvotes: 0
Reputation: 664
Ratings.filter ( x => x._1 != null )
this actually transforms the RDD but you are not using that particular RDD. U can try
Ratings.filter(_._1 !=null).foreach(println)
Upvotes: 5
Reputation: 37852
RDDs are immutable objects - any transformation on an RDD doesn't change that original RDD, but rather produces a new one. So - you should use the RDD returned from filter
(just like you do with the result of map
) if you want to see the effect of filter
:
val result = Ratings.filter ( x => x._1 != null )
result.foreach(println)
Upvotes: 5