Cornel Verster
Cornel Verster

Reputation: 1781

Read a file on an Azure storage account, then write it to another using Python

I want to use Python to do a relatively simple task:

  1. Read the contents of a file on a storage account
  2. Then write those contents to a new file on another storage account

I'm doing this from a databricks notebook, and I've tried using the python package for interacting with azure storage. I create two data lake service clients for my two storage accounts, then I create the relevant data lake directory and data lake file clients for my source and destination files.

What methods would I use to read the contents of my source FileClient and then write it's contents to the destination FileClient?

I have the following code:

source_service_client = DataLakeServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName="+source_account+";AccountKey="+source_account_key+";EndpointSuffix=core.windows.net")
destination_service_client= DataLakeServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName="+destination_account+";AccountKey="+destination_account_key+";EndpointSuffix=core.windows.net")

source_file_system_client = source_service_client.get_file_system_client(file_system=*container*)
try:
    destination_file_system_client = destination_service_client.create_file_system(file_system=*container*)
except Exception as e:
    print(e)

source_paths = source_file_system_client.get_paths(path="")
    for path in source_paths:
        # get the file
        
        if path.is_directory:
            source_directory_client = source_file_system_client.get_directory_client(path)
            destination_directory_client = destination_file_system_client.get_directory_client(path)
            try:
                destination_directory_client.create_directory()
            except Exception as e:
                print(e)
        else:
            source_file_client = source_file_system_client.get_file_client(path)
            source_file_contents = source_file_client.download_file()
            source_downloaded_bytes = source_file_contents.readall()
        
            destination_file_client = destination_file_system_client.get_file_client(path)
            try:
                destination_file_client.create_file()

                # THIS IS WHERE HELP IS NEEDED, I've tried the following without success                         
                destination_file_client.append_data(data=source_file_contents, offset=0)
            except Exception as e:
                print("could not write file " + str(e))

Upvotes: 0

Views: 520

Answers (1)

SwethaKandikonda
SwethaKandikonda

Reputation: 8234

This is because after the data is appended to file, you miss performing the flush. Unless the data is flushed the data stays uncommitted. Try adding destination_file_client.flush_data(len(source_downloaded_bytes)) after the append_data() method.

Below is the complete code that worked for me.

source_file_system_client = source_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")
try:
    destination_file_system_client = destination_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")
except Exception as e:
    print(e)

source_paths = source_file_system_client.get_paths(path="")
for path in source_paths:
    # get the file
        
    if path.is_directory:
        source_directory_client = source_file_system_client.get_directory_client(path)
        destination_directory_client = destination_file_system_client.get_directory_client(path)
        try:
            destination_directory_client.create_directory()
        except Exception as e:
            print(e)
    else:
        source_file_client = source_file_system_client.get_file_client(path)
        source_file_contents = source_file_client.download_file()
        source_downloaded_bytes = source_file_contents.readall()
        
        destination_file_client = destination_file_system_client.get_file_client(path)
        try:
            destination_file_client.create_file()

            destination_file_client.append_data(data=source_downloaded_bytes, offset=0)
            destination_file_client.flush_data(len(source_downloaded_bytes))
        except Exception as e:
            print("could not write file " + str(e))

In Source Storage Account 1

enter image description here

In Source Storage Account 2

enter image description here

Upvotes: 1

Related Questions