Reputation: 1344
Can someone let me know how filter on datestamp on file
I have the following files in their respective folders in Azure Data Lake:
adl://carlslake.azuredatalakestore.net/folderOne/filenr1_1166_2018-12-20%2006-05-52.csv
adl://carlslake.azuredatalakestore.net/folderTwo/filenr2_1168_2018-12-22%2006-07-31.csv
I have written the following script that will read all .csv files in both folders, but I only want to read .csv files in their respective folders based on current date.
test1 = spark.read.csv("adl://carlslake.azuredatalakestore.net/folderOne/",inferSchema=True,header=True)
test2 = spark.read.csv("adl://carlslake.azuredatalakestore.net/folderTwo/",inferSchema=True,header=True)
Can someone let me know how to tweak the above the read files on the folders based on current date e.g. the two .csv files are 2018-12-20 and 2018-12-22
I thought it might have been written something like
test1 = spark.read.csv("adl://carlslake.azuredatalakestore.net/folderOne/", select(current_date)inferSchema=True,header=True)
But that didn't work
Upvotes: 0
Views: 1144
Reputation: 87
Just go with
test1 = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/RAW/*{today}.csv"
The other pattern *_{today}*.csv
was not matching your file example above filenr1_1166_2018-12-20%2006-05-52.csv
Upvotes: 1
Reputation: 1624
Try something like
from datetime import datetime
today = datetime.today().date()
test1 = spark.read.csv(f"adl://carlslake.azuredatalakestore.net/
folderOne/*_{today}*.csv")
Upvotes: 1