Aiden
Aiden

Reputation: 139

Azure Data Flows generate the file contains an unrecognized extension while creating new folderpath

Hi I am trying to move the data(Input: Parquet file, Output: Json file) to the new folder(in the blob) by data flow activity. However, some how it generates a blank file but has the exact same name as new folder. If I switch on Show deleted blobs om the blob then they will get hided. Can anyone help to avoid these files created? Thanks!!

enter image description here

enter image description here

enter image description here

enter image description here

Upvotes: 0

Views: 815

Answers (1)

KarthikBhyresh-MT
KarthikBhyresh-MT

Reputation: 5044

That is by design when you upload/create directories or use services like ADF to create directories. See this remark by Amanda Nguyen on a azure-storage-fuse issue on GitHub:

This is actually expected behavior by blobfuse and implemented by design. I'm sorry there isn't currently a way we can prevent blobfuse from doing this and to remove those files after creation.

I just tried a similar setup, there seems to be no issue in specific to using Input: Parquet file and Output: Json file. This is an expected behavior and is by design. While Azure storage is used as sink, along with the result a number of files/bolbs are created and deleted as part of intermediate steps for the activity.

When the Dataflow runs successfully, you will see as below:

enter image description here

enter image description here

When you enable "Show deleted blobs" :

enter image description here

Natively "Show deleted blobs" button is to see the list of deleted blobs. This is a feature under Data Protection for Blob service. You can enable it through portal.

enter image description here

If you are using Azure Blob Storage and not DataLake, it will not have hierarchical namespace. You will have to use Data Lake Storage Gen2 to be able to create folders. I am assuming you have provided sink as a folder path in dataflow activity while using Azure blob storage.

The reason why this file is visible by default and hidden when chosen "Show deleted blob" is due to the mismatch or unavailability of the right properties and tags. Below is a comparison of a blob actually deleted and a blobfuse.

enter image description here

For the record, product team is working on fix for AzCopy, meanwhile you can try:

  1. calling the Blob APIs yourself to delete the blobs
  2. using whatever you originally used to create the blobs to delete them (Clean up files by built-in delete activity in Azure Data Factory)
  3. removing the hdi_isfolder=true metadata from the blobs (right click -> properties) and then deleting (for this solution, not 100% sure what the effects of deleting these blobs will be on the tool that was used to create the blobs, we strongly recommend that you make sure doing this is safe to do in the context of your tool)

Upvotes: 3

Related Questions