Reputation: 51
I have some xlsm and xlsb files that I store in my data lake. 4 XLSB/ XLSM files are loaded in daily and I want to load these files into my data factory pipeline to process them.
I understand that data factory only reads in xlsx files. Currently, my approach is to convert the files outside of data factory before loading it in. However, as these files are quite big, converting to xlsx outside takes quite awhile.
However, I was wondering if there is anyway to convert it in data factory itself? Or if possible, is there a work around to directly use the xlsb and xlsm files in data factory?
Source: Folder in Data Lake storage gen 2 Destination: Output to another folder in Data lake
Upvotes: 1
Views: 961
Reputation: 11529
You can use Binary datasets for these kinds of files. Create Binary dataset for the source and sink of the ADLS gen2 and use copy activity.
These are my sample files:
In the source Binary dataset, give the file path till container in the dataset and use wild card path in the copy activity source(Now give the path till your folder) like below.
As you want these files to be copied daily, use Schedule trigger for the above pipeline and schedule it daily. And to get only the latest daily files, use the Filter by last modified in the copy activity source(Make sure to give this in UTC format and it should match your time zone).
Use @adddays(utcNow(),-1)
for startdate and @utcNow()
for the end date.
In the sink, just give the path till the target folder like below.
Now, execute the pipeline and your files will be copied to the destination folder like mine.
Upvotes: 1