I am trying to implement a python3 library to create IPFS-Merkle DAGs but I have been unable to figure out the correct way to specify links?

Question

I've been working on a Python implementation for a Merkle DAG (Directed Acyclic Graph) with the goal of creating Content Addressable Archive (CAR) files. However, I've hit a roadblock and I'm struggling to figure out the correct way to specify links in the nodes. Following is my python3 implementation.

I'm using the multiformat library to generate CIDs for each chunk of data, and then I'm trying to create a Merkle DAG where each node contains links to its children. The end goal is to produce a CAR file.

I'm storing the CIDs of the chunks in the "links" field of the root node. However, I'm unsure if this is done correctly. Are there any specific requirements for linking nodes in a IPLD Merkle DAG that I might be missing?

If anyone has experience with Merkle DAGs and CAR file creation in Python, could you please review my code and provide insights into the correct way to specify links in the nodes and generate a valid CAR file?

I appreciate any assistance or suggestions to help me move past this roadblock. Thank you!

from multiformats import CID, varint, multihash, multibase
import dag_cbor
import json
import msgpack

def generate_cid(data, codec="dag-pb"):
    hash_value = multihash.digest(data, "sha2-256")
    return CID("base32", version=1, codec=codec, digest=hash_value)

def generate_merkle_tree(file_path, chunk_size):
    cids = []

    # Read the file
    with open(file_path, "rb") as file:
        while True:
            # Read a chunk of data
            chunk = file.read(chunk_size)
            if not chunk:
                break

            # Generate CID for the chunk
            cid = generate_cid(chunk, codec="raw")
            cids.append((cid, chunk))

    # Generate Merkle tree root CID from all the chunks
    # root_cid = generate_cid(b"".join(bytes(cid[0]) for cid in cids))
    
    # Create the root node with links and other data
    root_node = {
        "file_name": "test.png",
        "links": [str(cid[0]) for cid in cids]
    }
    
    # Encode the root node as dag-pb
    root_data = dag_cbor.encode(root_node)
    
    # Generate CID for the root node
    root_cid = generate_cid(root_data, codec="dag-pb")
    
    return root_cid, cids, root_data

def create_car_file(root, cids):
    header_roots = [root]
    header_data = dag_cbor.encode({"roots": header_roots, "version": 1})
    header = varint.encode(len(header_data)) + header_data

    car_content = b""
    car_content += header
    for cid, chunk in cids:
        cid_bytes = bytes(cid)
        block = varint.encode(len(chunk) + len(cid_bytes)) + cid_bytes + chunk
        car_content += block
    
    root_cid = bytes(root)
    root_block = varint.encode(len(root_cid)) + root_cid
    car_content += root_block
    with open("output.car", "wb") as car_file:
        car_file.write(car_content)

file_path = "./AADHAAR.png"  # Replace with the path to your file
chunk_size = 16384  # Adjust the chunk size as needed

root, cids, root_data = generate_merkle_tree(file_path, chunk_size)
print(root)
create_car_file(root, cids)

I've been working on a Python implementation to create a Merkle DAG and subsequently generate a Content Addressable Archive (CAR) file.

I attempted to link nodes by storing the CIDs of the chunks in the "links" field of the root node. However, I'm uncertain if I'm doing this correctly. My expectation was that each node would contain links to its children, but I'm unsure if there are specific requirements for linking nodes in a IPLD Merkle DAG.

I am trying to implement a python3 library to create IPFS-Merkle DAGs but I have been unable to figure out the correct way to specify links?

Answers (1)

Related Questions