Si_CPyR
Si_CPyR

Reputation: 161

Iterate through range of dates in scala

How can I make sure o_data below appends multiple files? Assuming there's one file (Tab separated Values) for each date (starting in 2018-09-01), I'd like to append all 30 files (9/1~9/30) and store it in the o_data variable. My initial guess would be to use for loop, but not being not familiar with scala, not sure where to start.

The below works for one file.

val o_data = "test::repo/shared/[2018-09-01]"

Then I use

val data = tes.read(o_data)

to read the file, but in order for me to get a full month's of data, the only thing I can do is to create different val for each of the file, so o_data2, o_data3 ... o_data30, and run the read function for each of the file and combine it at the end, but that sounds silly...

Upvotes: 0

Views: 995

Answers (2)

stack0114106
stack0114106

Reputation: 8711

To get the range for any month, use the java.time library. Check this out

scala> val o_data =  (1 to 31)
o_data: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)

scala> val (year,month) = (2018,9)
year: Int = 2018
month: Int = 9

scala> o_data.map( x => { val y=java.time.LocalDate.of(year,month,1); y.plusDays(x-1)} ).filter( _.getMonthValue==month).map(s"test::repo/shared/["+_.toString+"]").foreach(println)
test::repo/shared/[2018-09-01]
test::repo/shared/[2018-09-02]
test::repo/shared/[2018-09-03]
test::repo/shared/[2018-09-04]
test::repo/shared/[2018-09-05]
test::repo/shared/[2018-09-06]
.........
test::repo/shared/[2018-09-30]

scala>

Upvotes: 1

jrook
jrook

Reputation: 3519

You can do something like:

    val o_data = (1 to 30).map(d => {
      val df = if(d<10) "0"+d else d 
      s"test::repo/shared/[2018-09-$df]"
    })

After the above, o_data will be:

test::repo/shared/[2018-09-01]
test::repo/shared/[2018-09-02]
test::repo/shared/[2018-09-03]
test::repo/shared/[2018-09-04]
test::repo/shared/[2018-09-05]
...
test::repo/shared/[2018-09-28]
test::repo/shared/[2018-09-29]
test::repo/shared/[2018-09-30]

The idea is to use Scala's string interpolation to construct the right filename from a number. The if statement ensures that there will be a 0 before the number if it is less than 10.

Edit: If you like one liners (as I do), the above can be re-written as (again using the capabalities string interploation offers and thanks to @Dima for the suggestion):

val o_data=val files = (1 to 30)map(d =>f"test::repo/shared/[2018-09-$d%02d]")

Edit 2: since these are file names, we can use file API to read them:

val allLines:mutable.Buffer[String] = mutable.Buffer()
o_data.foreach(filename => {
  val lines = tes.read(filename)
  allLines.append(line)
  ... //do stuff with lines read from file: "filename"
}
allLines foreach println

Of course, you should be mindful of any errors that might arise from reading a bunch of files (file not exists, etc). The foreach loop reads filenames present in o_data and will process them one by one. You can see here to see a few examples on how to open and read files.

Edit 3: Aggregating all lines in the files can be achieved using a more functional style too:

import scala.io.Source.fromFile
val allLines = files.foldLeft(Iterator[String]())((f, g) => f ++ fromFile(g).getLines)
allLines foreach println

The advantage of this method is that it concatenates iterators which may help if files are large. If it is desired to get strings, the following can be done:

import scala.io.Source.fromFile
val allLines = files.foldLeft(List[String]())((f, g) => f ++ fromFile(g).getLines.toList)
allLines foreach println

This method can be successful with any file reading technique that reaturns lines in the file (data.read in the OP's question).

Upvotes: 1

Related Questions