Reputation: 73
I am trying to automate my spark code in Scala or python and here is what I am trying to do
Format of files in s3 bucket is filename_2016_02_01.csv.gz
From s3 bucket the spark code should be able to pick the file name and create an Dataframe
example Dataframe=sqlContext.read.format("com.databricks.spark.csv").options(header="true").options(delimiter=",").options(inferSchema="true").load("s3://bucketname/filename_2016-01-29.csv.gz")
So every day when I run the job it should be pick that particular days file and create an dataframe instead of me specifying the file name .
Any Idea on how to write code for this condition ?
Thanks in Advance.
Upvotes: 1
Views: 1274
Reputation: 1476
If i understood you correctly, you want the file name change automatically based on that day date. if that's the case:
here is a Scala solution:
Im using joda-time to generate that date.
import org.joda.time.format.DateTimeFormat
import org.joda.time.{DateTimeZone, DateTime}
...
val today = DateTime.now(DateTimeZone.UTC).toString(DateTimeFormat.forPattern("yyyy_MM_dd"))
val fileName = "filename_" + today + ".csv.gz"
...
Python solution:
from datetime import datetime
today = datetime.utcnow().strftime('%Y_%m_%d')
file_name = 'filename_' + today + '.csv.gz'
Upvotes: 2