harshith
harshith

Reputation: 91

How to use output of ADF stage in databricks notebook

I'm using get metadata stage in ADF to get the child values( file names) and Filetype. I need to pattern match the file names using databricks notebook. how to use the output of Get Metadata stage in ADF in databricks notebook and pattern match the part of string file names?

Upvotes: 0

Views: 905

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11454

The activity('Get Metadata1').output.childItems gives the file names and file types as array. But currently, it is not supported to pass the array variable from notebook activity to databricks.

So, pass it as string and do the remaining work in databricks notebook.

Please follow the demonstration below:

  • First create a linked service for the databricks workspace in ADF and give it to the Notebook activity.
  • Then create the notebook in the databricks and give notebook to notebook activity in ADF. You can find about it in the documentation above. Here my notebook name is forADF.

enter image description here

  • Now Go to the Databricks notebook and use the below code.

Code:

data=dbutils.widgets.get("names")   # data from Get Meta data activity
mypattern="pattern"   # Our Pattern
print(data)
print(type(data))
print(mypattern)

data=data[1:len(data)-1]   # To remove first [ and last ]
data=data.replace('{','')  # removes all { in data
data=data.replace('}','')  # removes all } in data
print(data)

files_list=data.split(",")   # makes an array of file names and types as strings.
print(files_list)

result=[]
for i in range(0,len(files_list),2):   # File names are 2 positions seperated from start in the list
    if(mypattern in files_list[i]):
        t=files_list[i]+" -> "+files_list[i+1]+" -> "+"Pattern - True"
        result.append(t)
    else:
        t=files_list[i]+" -> "+files_list[i+1]+" -> "+"Pattern - False"
        result.append(t)
        
for i in result:
    print(i)

My Outputs for your reference:

  1. Get the paremeter from activity:

enter image description here

  1. Do some operations on it to do pattern match:

enter image description here

  1. Pattern match and set true or false.

enter image description here

  • If you want to avoid the file type use a ForEach and store only the file names in an array variable like this SO thread and pass it in the Notebook activity as string.

    @string(variable("variable_name"))

  • Now you can remove the [] and split it and store the file names list in databricks code. You can do the step 3 above by iterating one position at a time.

Upvotes: 1

Related Questions