newSparkbabie
newSparkbabie

Reputation: 73

How to fetch a file name automatically in to a data frame instead of manually specifying it

I am trying to automate my spark code in Scala or python and here is what I am trying to do

Format of files in s3 bucket is filename_2016_02_01.csv.gz

From s3 bucket the spark code should be able to pick the file name and create an Dataframe

example Dataframe=sqlContext.read.format("com.databricks.spark.csv").options(header="true").options(delimiter=",").options(inferSchema="true").load("s3://bucketname/filename_2016-01-29.csv.gz")

So every day when I run the job it should be pick that particular days file and create an dataframe instead of me specifying the file name .

Any Idea on how to write code for this condition ?

Thanks in Advance.

Upvotes: 1

Views: 1274

Answers (2)

noorul
noorul

Reputation: 1353

load("s3://bucketname/{}").format(file_name)

Upvotes: -1

Urban48
Urban48

Reputation: 1476

If i understood you correctly, you want the file name change automatically based on that day date. if that's the case:

here is a Scala solution:
Im using joda-time to generate that date.

import org.joda.time.format.DateTimeFormat
import org.joda.time.{DateTimeZone, DateTime}
...

val today = DateTime.now(DateTimeZone.UTC).toString(DateTimeFormat.forPattern("yyyy_MM_dd"))
val fileName = "filename_" + today + ".csv.gz"

...

Python solution:

from datetime import datetime

today = datetime.utcnow().strftime('%Y_%m_%d')
file_name = 'filename_' + today + '.csv.gz'

Upvotes: 2

Related Questions