Reputation: 1781
I want to use Python to do a relatively simple task:
I'm doing this from a databricks notebook, and I've tried using the python package for interacting with azure storage. I create two data lake service clients for my two storage accounts, then I create the relevant data lake directory and data lake file clients for my source and destination files.
What methods would I use to read the contents of my source FileClient and then write it's contents to the destination FileClient?
I have the following code:
source_service_client = DataLakeServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName="+source_account+";AccountKey="+source_account_key+";EndpointSuffix=core.windows.net")
destination_service_client= DataLakeServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName="+destination_account+";AccountKey="+destination_account_key+";EndpointSuffix=core.windows.net")
source_file_system_client = source_service_client.get_file_system_client(file_system=*container*)
try:
destination_file_system_client = destination_service_client.create_file_system(file_system=*container*)
except Exception as e:
print(e)
source_paths = source_file_system_client.get_paths(path="")
for path in source_paths:
# get the file
if path.is_directory:
source_directory_client = source_file_system_client.get_directory_client(path)
destination_directory_client = destination_file_system_client.get_directory_client(path)
try:
destination_directory_client.create_directory()
except Exception as e:
print(e)
else:
source_file_client = source_file_system_client.get_file_client(path)
source_file_contents = source_file_client.download_file()
source_downloaded_bytes = source_file_contents.readall()
destination_file_client = destination_file_system_client.get_file_client(path)
try:
destination_file_client.create_file()
# THIS IS WHERE HELP IS NEEDED, I've tried the following without success
destination_file_client.append_data(data=source_file_contents, offset=0)
except Exception as e:
print("could not write file " + str(e))
Upvotes: 0
Views: 520
Reputation: 8234
This is because after the data is appended to file, you miss performing the flush. Unless the data is flushed the data stays uncommitted. Try adding destination_file_client.flush_data(len(source_downloaded_bytes))
after the append_data()
method.
Below is the complete code that worked for me.
source_file_system_client = source_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")
try:
destination_file_system_client = destination_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")
except Exception as e:
print(e)
source_paths = source_file_system_client.get_paths(path="")
for path in source_paths:
# get the file
if path.is_directory:
source_directory_client = source_file_system_client.get_directory_client(path)
destination_directory_client = destination_file_system_client.get_directory_client(path)
try:
destination_directory_client.create_directory()
except Exception as e:
print(e)
else:
source_file_client = source_file_system_client.get_file_client(path)
source_file_contents = source_file_client.download_file()
source_downloaded_bytes = source_file_contents.readall()
destination_file_client = destination_file_system_client.get_file_client(path)
try:
destination_file_client.create_file()
destination_file_client.append_data(data=source_downloaded_bytes, offset=0)
destination_file_client.flush_data(len(source_downloaded_bytes))
except Exception as e:
print("could not write file " + str(e))
In Source Storage Account 1
In Source Storage Account 2
Upvotes: 1