David Ross
David Ross

Reputation: 67

Copy data every 1 minute from DataLake by DataFactory

I have a Data Lake storage with the following folder structure:

{YEAR}
 - {MONTH}
  - {DAY}
   - {HOUR}
     - {sometext}_{YEAR}_{MONTH}_{DAY}_{HOUR}_{Minute}_{someuuid}.json

example

enter image description here

Could you please help me to configure Data Factory Copy data action? I need to run Trigger every 1 minute - to copy data from Data Lake by previous minute to Cosmos DB I've tried this enter image description here where the first expresion is

@formatDateTime(utcnow(),'yyyy/MM/dd/HH')

and the second one

@{formatDateTime(utcnow(),'yyyy')}_@{formatDateTime(utcnow(),'MM')}_@{formatDateTime(utcnow(),'dd')}_@{formatDateTime(utcnow(),'HH')}_@{formatDateTime(addMinutes(utcnow(), -1),'mm')}*.json

But it can skip some data, especially when Hour changes. I'm a new in Data Factory and don't know what is the more efficient way how to do that. Please help

Upvotes: 0

Views: 196

Answers (1)

Joel Cochran
Joel Cochran

Reputation: 7768

The Pipeline Expression Language has a number of Date functions built in. You can use the addMinutes function to add 1 minute.

To avoid clock skew, I would capture the utcnow() value and store it without any formatting:

enter image description here

In another variable, add a minute to the captured value rather than executing utcnow() again:

enter image description here

Once you have those variables, just use them to format the date string(s). enter image description here

Result:

enter image description here

NOTE: use concat with the formatDateString to get the wildcard value you want: enter image description here

Result:

enter image description here

Upvotes: 2

Related Questions