Reputation: 91
I'm using get metadata stage in ADF to get the child values( file names) and Filetype. I need to pattern match the file names using databricks notebook. how to use the output of Get Metadata stage in ADF in databricks notebook and pattern match the part of string file names?
Upvotes: 0
Views: 905
Reputation: 11454
The activity('Get Metadata1').output.childItems
gives the file names and file types as array. But currently, it is not supported to pass the array variable from notebook activity to databricks.
So, pass it as string and do the remaining work in databricks notebook.
Please follow the demonstration below:
forADF
.Code:
data=dbutils.widgets.get("names") # data from Get Meta data activity
mypattern="pattern" # Our Pattern
print(data)
print(type(data))
print(mypattern)
data=data[1:len(data)-1] # To remove first [ and last ]
data=data.replace('{','') # removes all { in data
data=data.replace('}','') # removes all } in data
print(data)
files_list=data.split(",") # makes an array of file names and types as strings.
print(files_list)
result=[]
for i in range(0,len(files_list),2): # File names are 2 positions seperated from start in the list
if(mypattern in files_list[i]):
t=files_list[i]+" -> "+files_list[i+1]+" -> "+"Pattern - True"
result.append(t)
else:
t=files_list[i]+" -> "+files_list[i+1]+" -> "+"Pattern - False"
result.append(t)
for i in result:
print(i)
My Outputs for your reference:
If you want to avoid the file type use a ForEach and store only the file names in an array variable like this SO thread and pass it in the Notebook activity as string.
@string(variable("variable_name"))
Now you can remove the [] and split it and store the file names list in databricks code. You can do the step 3 above by iterating one position at a time.
Upvotes: 1