theMadKing
theMadKing

Reputation: 2074

Filtering After a Specific Date Spark

I am trying to filter after a specific date in spark I have the following RDD, I have an array of 2 strings the first is a Date and next is a Path, I want to check what Path's have changed after a specific date:

val cleanRDD = oivRDD.map(x => (x(5), x(7)))

res16: Array[(String, String)] = Array( (2015-06-24,/), (2015-07-17,/cdh), (2015-06-26,/datameer), (2015-06-24,/devl), (2015-08-11,/dqa), (2015-03-12,/lake), (2015-02-13,/osa))

I'm Using Java's SimpleDateFormt:

val sampleDate = new SimpleDateFormat("yyyy-MM-dd")
val filterRDD = cleanRDD.filter(x => dateCompare(x))

My Date Compare:

  def dateCompare(input:(String, String)): Boolean = {
    val date1 = sampleDate.format(input._1)
    val date2 = sampleDate.parse(date1)
    val date3 = sampleDate.parse("2015-07-01")
    if (date2.compareTo(date3) > 0)  true
    else
      false
  }

I am getting the following error:

15/08/12 10:21:16 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 10, edhpdn2128.kdc.capitalone.com): java.lang.IllegalArgumentException: Cannot format given Object as a Date

Upvotes: 1

Views: 4716

Answers (1)

pippobaudos
pippobaudos

Reputation: 809

With the new dataframe framework, it's valid an expression like:

dfLogging.filter(dfLogging("when") >= "2015-01-01")

The column has timestamp type:

scala> dfLogging.printSchema()
root
 |-- id: long (nullable = true)
 |-- when: timestamp (nullable = true)
 |-- ...

This syntax is valid for Scala, but should be similar for Java and Pyhton

Upvotes: 1

Related Questions