harshith
harshith

Reputation: 91

how to pass the outputs from Get metadata stage and use it for file name comparison in databricks notebook

I have 2 Get metadata stages in ADF which is fetching file names from 2 different folders, I need to use these outputs for file name comparison in databricks notebook and return true if all the files are present.

how to pass the output from Get meta data stages to databricks and perform string comparison and return true if all files are present and return false if even 1 file is missing

How to achieve this?

Upvotes: 1

Views: 1729

Answers (1)

Chaitanya
Chaitanya

Reputation: 73

Please find the below answer which I explained with 1 Get metadata stage , the same can be replicated for more than one also.

Create an ADF pipeline with below activities. Sample ADF Pipeline to replicate scenario

Now in the Get Metadata activity , add the childItems in the Fieldlist as argument, to pass the output of Get Metadata to Notebook as show below Argument list in Get Metadata

In the Databricks Notebook activity , add the below parameter as Base Paramter which will capture the output of Get Metadata and pass as input paramater to Notebook. Generally this parameter will of object datatype , but I converted to string datatype to access the names of files in the notebook as show below Base Parameter in  Notebook activity

@string(activity('Get Metadata1').output.childItems)

Now we can able to access the Get Metadata output as string in the notebook.

import ast
required_filenames = ['File1.csv','File2.csv','File3.csv'] ##This is for comparing with the output we get from GetMetadata activity.
metadata_value = dbutils.widgets.get('metadata_output') ##Accessing the output from Get Metadata and storing into a variable using databricks widgets.
metadata_list = ast.literal_eval(metadata_value) ##Converting the above string datatype to the list datatype.

blob_output_list=[] ##Creating an empty list to add the names of files we get from GetMetadata activity.
for i in metadata_list:
   blob_output_list.append(i['name']) ##This will add all the names of files from blob storage to the empty list we created above.

validateif = all(item in blob_output_list for item in required_filenames) ##This  validateif variable now compare both the lists using list comprehension and provide either True or False.

I tried in the above way and can able to solve the provided requirement. Hope this helps. Request to please upvote the answer if this helps in your requirement.

Upvotes: 1

Related Questions