amro_ghoneim
amro_ghoneim

Reputation: 535

how to Load data from last modified files within one day from subfolders Azure Data Flow

I have the following directory structure on an Azure container:

-dwh-prod
  -Main_Folder
   -2021-01
     -file1.parquet
   -2021-02
     -file2.parquet
     -file3.parquet

where the Data is partitioned by year and month to create subfolders. Within these sub-folders, I have my data files. I want to load into my data flow only the latest files that were added within one day from running my data flow pipeline.

I tried using currentUTC() in End Time and subtracting one day -> AddDays(currentUTC(), -1) in Start Time in the 'Filter by last modified' option provided in source options but it didn't work.

I also tried using currentTimestamp() instead but to no avail.

enter image description here

enter image description here

How do I go about solving this?

Upvotes: 1

Views: 1570

Answers (2)

Steve Johnson
Steve Johnson

Reputation: 8660

Your expression is correct. Please change the folder path from MainFolder to Main_folder in your dataset and set Main_Folder/*/*.parquet as your Wildcard paths in your Source option. Then it will work.

Upvotes: 2

Joel Cochran
Joel Cochran

Reputation: 7728

I think your solution is close, but I'm not sure the folder name is sufficient. I'm also not familiar with "currentUTC". The correct function should be utcNow.

Below is an outline of how I would approach this problem.

Source Dataset

Add a Parameter for the subfolder (year-month):

enter image description here

and then set the Folder path to an expression like:

enter image description here

Pipeline

You could either pass in the subfolder or calculate it at runtime. My preference would be to pass it in as a parameter:

enter image description here

I would then add variables to calculate the start and end times. Since you are running this daily, I would be sure to force the time to the START of the day(s). This should handle any vagaries based on run time. Also, I would use the built in getPastTime function:

enter image description here enter image description here

Now use these objects in your Source configuration:

enter image description here

Upvotes: 1

Related Questions