Reputation: 421
I've a dataframe where some of the fields are having the date format of D.HH:mm:ss, D.HH:mm:ss.SSSSSSS & HH:mm:ss.SSSSSSS. I'll need to filter only the date of type HH:mm:ss.SSSSSSS and convert this date to seconds(integer).
I've written below scala code that converts the date to seconds. I need help in filtering the date type of a particular format(HH:mm:ss.SSSSSSS) only and skip other formats of date in a dataframe. Any help would be appreciated.
def hoursToSeconds(a: Any): Int = {
val sec = a.toString.split('.')
val fields = sec(0).split(':')
val creationSeconds = fields(0).toInt*3600 + fields(1).toInt*60 + fields(2).toInt
return creationSeconds
}
Upvotes: 0
Views: 227
Reputation: 14845
The task can be split up into two parts:
Create some test data:
val df = Seq(
("one", "1.09:39:26"),
("two", "1.09:39:26.1234567"),
("three", "09:39:26.1234567")
).toDF("info", "time")
Definition of regexp and udf:
val pattern = "\\A(\\d{1,2}):(\\d{2}):(\\d{2})\\.\\d{7}\\z".r
val toSeconds = udf{in: String => {
val pattern(hour, minute, second) = in
hour.toInt * 60 * 60 + minute.toInt * 60 + second.toInt
}}
The actual code:
df
.filter('time rlike pattern.regex)
.select('info, 'time, toSeconds('time).as("seconds"))
.show
prints
+-----+----------------+-------+
| info| time|seconds|
+-----+----------------+-------+
|three|09:39:26.1234567| 34766|
+-----+----------------+-------+
If the lines that do not have the correct format should be kept, the udf can be changed slightly and the filter has to be removed:
val pattern = "\\A(\\d{1,2}):(\\d{2}):(\\d{2})\\.\\d{7}\\z".r
val toSeconds = udf{in: String => {
in match {
case pattern(hour, minute, second)=> hour.toInt * 60 * 60 + minute.toInt * 60 + second.toInt
case _ => 0
}
}}
df
.select('info, 'time, toSeconds('time).as("seconds"))
.show
prints
+-----+------------------+-------+
| info| time|seconds|
+-----+------------------+-------+
| one| 1.09:39:26| 0|
| two|1.09:39:26.1234567| 0|
|three| 09:39:26.1234567| 34766|
+-----+------------------+-------+
Upvotes: 1
Reputation: 79
You can try matching using a regex with extractors like so:
val dateRegex = """(\d{2}):(\d{2}):(\d{2}).(\d{7})""".r
val D_HH_mm_ss = "1.12:12:12"
val D_HH_mm_ss_SSSSSSS = "1.12:12:12.1234567"
val HH_mm_ss_SSSSSSS = "12:12:12.1234567"
val dates = List(HH_mm_ss_SSSSSSS, D_HH_mm_ss_SSSSSSS, D_HH_mm_ss)
dates.foreach {
_ match {
case dateRegex(hh, mm, ss, sssssssss) => println(s"Yay! $hh-$mm-$ss")
case _ => println("Nay :(")
}
}
Upvotes: 0