H C
H C

Reputation: 11

Reading and hashing a file from a SMB with Python

I want to write a robust copy program in python to copy from a GCP bucket (using Pub/Sub) to a NFS SMB.

I managed to use the smbclient Python library, specifically stat, makedirs and open_file methods. I use wb and rb modes of open_file because i'm using csv files, and they don't have EOF.

But i have an issue : the open_file into read() into hashlib.md5() makes the read() stuck in my docker.

I had big trouble making my first smbclient write work and i know this library/protocol is disgusting to say the least.

I also read somewhere time between servers may play a role, my NFS is PST and my docker is GMT, same time (one minute diff).

I didn't find good examples online, the https://pypi.org/project/smbprotocol/ documentation is poor.

Here the code detailed :

def sign_md5_base64_from_file(content):
   
    md5_hash = hashlib.md5()
    
    chunk_size = 4096
    for i in range(0, len(content), chunk_size):
        md5_hash.update(content[i:i + chunk_size])
    
    return base64.b64encode(md5_hash.digest()).decode('utf-8')
def send_file(chemin_samba):
   
    [variables initializations]
        try:
            smbclient.stat(fullname_csv)
            fichier_existe = True
        except smbprotocol.exceptions.SMBOSError as e:
            if e.ntstatus == "0xc0000034":
                fichier_existe = False 
            else:
                print(f"An error occurred: {e}")
    
        
        smbclient.makedirs(dossier_destination, exist_ok=True)

         if fichier_existe:
            time.sleep(1)
            with smbclient.open_file(fullname_csv, mode="rb", share_access="r") as remote_file:
                remote_content = remote_file.read()
                samba_md5 = utils.sign_md5_base64_from_file(remote_content)
            
            local_md5 = utils.sign_md5_base64_from_file_path(variables.CHEMIN_CSV_LOCAL)
            
            if local_md5 == samba_md5:
                return
            else:
        
        else:
            logger.info(f"Ecriture du fichier CSV dans le répertoire distant - fullname : {fullname_csv} - nom du csv : {nom_csv}")
            with open(variables.CHEMIN_CSV_LOCAL, "rb") as local_file:
                with smbclient.open_file(fullname_csv, mode="wb") as remote_file:
                    remote_file.write(local_file.read())
            logger.info(f"Fichier {nom_csv} envoyé avec succès à {chemin_samba}.")

And here is my log :

INFO:file_sender: makedirs OK
INFO:smbprotocol.open:Session: nfs_smb OK, Tree Connect: 
DEBUG:smbprotocol.transport:Socket recv() returned 4 bytes (total 4)
DEBUG:smbprotocol.transport:Socket recv(160) (total 160)
DEBUG:smbprotocol.transport:Socket recv() returned 160 bytes (total 160)
DEBUG:smbprotocol.transport:Socket recv(4) (total 4)
INFO:smbprotocol.open:Session: nfs_smb_truenas, Tree Connect:  - receiving SMB2 Close Response
INFO:file_sender: download file
INFO:smbprotocol.open:Session: , Tree Connect: nfs_smb - receiving SMB2 Create Response
DEBUG:smbprotocol.transport:Socket recv() returned 4 bytes (total 4)
DEBUG:smbprotocol.transport:Socket recv(152) (total 152)
DEBUG:smbprotocol.transport:Socket recv() returned 152 bytes (total 152)
DEBUG:smbprotocol.transport:Socket recv(4) (total 4)
DEBUG:smbprotocol.open:SMB2CreateResponse:
    structure_size = 89
    oplock_level = (0) SMB2_OPLOCK_LEVEL_NONE
    flag = 0
    create_action = (1) FILE_OPENED
    creation_time = 2024-12-12 10:23:27.226642+00:00
    last_access_time = 2024-12-12 10:23:27.226642+00:00
    last_write_time = 2024-12-12 10:23:27.234844+00:00
    change_time = 2024-12-12 10:23:27.234844+00:00
    allocation_size = 45568
    end_of_file = 44909
    file_attributes = (32) FILE_ATTRIBUTE_ARCHIVE
    reserved2 = 0
    file_id = 3D D2 B6 EF 00 00 00 00 13 DF 5B 6B 00 00 00 00
    create_contexts_offset = 0
    create_contexts_length = 0
    buffer = []

    Raw Hex:
        
INFO:file_sender:Open SMB RB OK
DEBUG:smbclient._io:Read 0 -65536.
INFO:smbprotocol.open:Session: nfs_smb_truenas, Tree Connect ID: NAS SMB
DEBUG:smbprotocol.transport:Socket recv() returned 4 bytes (total 4)
DEBUG:smbprotocol.transport:Socket recv(44989) (total 44989)
DEBUG:smbprotocol.transport:Socket recv() returned 13028 bytes (total 44989)
DEBUG:smbprotocol.transport:Socket recv(31961) (total 44989)

[logs gets stuck here until timeout or if i docker stop my_docker]

On the SMB NFS there is a lock on my csv that doesn't go away (seems logical as the .read() is stuck) :

[~] smbstatus -L
Locked files:
Pid          User(ID)   DenyMode   Access      R/W        Oplock           SharePath   Name   Time
---------------------------------------------------------------------------
58763        1000       DENY_WRITE 0x89        RDONLY     NONE             /mnt/pool1/nfs_smb   path_to_file/test.csv    date PST

From what i analyze in my logs, the code gets stuck on the remote_file.read() part

I'd like to be able to read the distant SMB file if it exists, then hash it and compare my local and distant files MD5.

If it does not exists, i create the directories and then write the file.

Upvotes: 1

Views: 51

Answers (0)

Related Questions