Reputation: 386
I am ultimately trying to get the names of the partitions (ie DATE_ID=20211004) that have been modified within the last 24 hours. I've tried using the Get Metadata activity, but it seems to only iterate partition folders, and not the files within the partitioned folders. I need to get the last modified date of the partitioned files within the folders, but I'm not sure how to do that.
Folder structure:
product_data/
|---------DATE_ID=20211002/
|---------|---------part-00011-tid-345678900-abc123-6793-1.c000.snappy.parquet
|---------|---------committed_123456789
|---------DATE_ID=20211003/
|---------|---------part-00086-tid-345678900-abc123-6756-2.c000.snappy.parquet
|---------|---------committed_123456789
|---------DATE_ID=20211004/
|---------|---------part-00042-tid-345678900-abc123-6712-1.c000.snappy.parquet
|---------|---------committed_123456789
In the example above, I'd like to get the last modified date and full name of each of the parquet files.
I've tried a variety of For Each + get Metadata activity combinations such as the one below:
Upvotes: 0
Views: 6030
Reputation: 5034
You would have to set Start time (UTC) to @getPastTime(24,'Hour')
and leave the End time (UTC) as NULL to fetch all the files modified within last 24hrs from current timestamp. Using this would only output childitems, that are files and not folders. This is by design.
Create source dataset with path being the root for partitioned data. Use Get Metadata
activity to list the files in that folder.
Assign the output list of files to an array variable:
@activity('Get Metadata').output.childItems
Next, you can use the meta data option lastModified to get the Last modified datetime of each file or folder.
Inside foreach
activity @variables('items')
, create a clone of source file and add a parameter to dynamically get the files based on each file we receive inside foreach.
Inside foreach activity
:
We then append these lastmodified
timestamp received to another variable (you can customize this as per your need)
@concat(item().name, ' ', 'LastModified:' , activity('Get Metadata1').output.lastModified)
Finally to list all the results, store the result from appended variable to result variable named final
.
{
"name": "final",
"value": [
"SourceAAD.csv LastModified:2021-09-29T13:15:07Z",
"mysource.csv LastModified:2021-09-27T11:56:33Z",
"new adf.xlsx LastModified:2021-09-29T13:16:01Z",
"sourceDataTime.parquet LastModified:2021-09-13T05:42:43Z",
"table3_CustomerAddress.csv LastModified:2021-10-04T06:01:12Z",
"table4_product.csv LastModified:2021-10-04T06:01:13Z",
"table5_ProductCategory.csv LastModified:2021-10-04T06:01:12Z",
"table6_ProductDescription.csv LastModified:2021-10-04T06:01:12Z",
"testSource.json LastModified:2021-08-30T04:19:54Z",
"testSource.txt LastModified:2021-08-30T04:17:56Z",
"userdata2.parquet LastModified:2021-09-13T03:57:27Z",
"userdataCopy.parquet LastModified:2021-09-13T03:57:28Z",
"userdataOriginal.parquet LastModified:2021-09-13T03:56:51Z"
]
}
Upvotes: 1