Inserting a date variable into Dataframe with a sting file path (read.csv)

Question

I am working through a process where I want to ingest a csv file into a Dataframe. This file is a delta file that runs daily and is stored in Azure DataLake store.

DF = (

  spark
  .read.option("header",True)\.option("inferSchema", "true").option("delimiter", "|")
  .csv("folder2/folder1/Intenstion_file2020*.csv")
)

From the above code I basically collect all the files that start with "file2020" and then all other files. So if there is 10 then it gets put into one dataframe.

What I want to do through is instead of ingesting all of those 10 files into a dataframe is instead select the file that matches a system date. So if I have the following files: 1) file2020/01/01 2) file2020/01/02 3) file2020/01/09 I want only the third file to be ingested. Then the next time it would select the next file that has the most current date.

I tried solving this by first getting a system date. This runs before the dataframe portion.

 #Getting System Time Stamp
import datetime
date_value = datetime.datetime.now()
print(datetime.datetime.strftime(date_value,'%Y/%m/%d'))

So if I run that above notebook I would have "date_value" = 2020/01/09. What I wanted to do is then concatanate that value into the "csv(path)" in the dataframe example above.

So instead of having

.csv("folder2/folder1/Intenstion_file2020*.csv")

I would have something like:

.csv(concat_ws("....file" date_value "*.csv"))

So it would automatically find the file with the date that is closest to system date.

I tried some variables of above, but I am missing the proper syntax or if what I am doing above is possible. Has anyone tried to do the above?

Any help is appreciated.

Update 01/09/2020 I updated the question to make it clearer as to what I am trying to achieve.

Inserting a date variable into Dataframe with a sting file path (read.csv)

Answers (1)

Related Questions