dmn
dmn

Reputation: 23

Does the OS version need to match for Dask client and scheduler/workers?

I have a Dask cluster running on Fargate/ECS. On an EC2 instance, I am running the client that submits a job (subset and average along particular dimensions on a Zarr dataset) to the cluster. I am running into a pickle error:

2024-09-21 20:50:04,023 - distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x05\x95\xde\x05\x00\x00\x00\x00\x00\x00\x8c\x16tblib.pickling_support\x94\x8c\x12unpickle_exception\x94\x93\x94(\x8c\x08builtins\x94\x8c\x0cRuntimeError\x94\x93\x94\x8c\xfaError during deserialization of the task graph. This frequently\noccurs if the Scheduler and Client have different environments.\nFor more information, see\nhttps://docs.dask.org/en/stable/deployment-considerations.html#consistent-software-environments\n\x94\x85\x94h\x02(h\x03\x8c\x13ModuleNotFoundError\x94\x93\x94\x8c!No module named \'averaging_utils\'\x94\x85\x94Nh\x00\x8c\x12unpickle_traceback\x94\x93\x94\x8c\x05tblib\x94\x8c\x05Frame\x94\x93\x94)\x81\x94}\x94(\x8c\x08f_locals\x94}\x94\x8c\tf_globals\x94}\x94(\x8c\x08__name__\x94\x8c\x15distributed.scheduler\x94\x8c\x08__file__\x94\x8c@/opt/conda/lib/python3.12/site-packages/distributed/scheduler.py\x94u\x8c\x06f_code\x94h\x0e\x8c\x04Code\x94\x93\x94)\x81\x94}\x94(\x8c\x0bco_filename\x94h\x1a\x8c\x07co_name\x94\x8c\x0cupdate_graph\x94\x8c\x0bco_argcount\x94K\x00\x8c\x11co_kwonlyargcount\x94K\x00\x8c\x0bco_varnames\x94)\x8c\nco_nlocals\x94K\x00\x8c\x0cco_stacksize\x94K\x00\x8c\x08co_flags\x94K@\x8c\x0eco_firstlineno\x94K\x00ub\x8c\x08f_lineno\x94M\xc9\x12ubM\x92\x12h\x0e\x8c\tTraceback\x94\x93\x94)\x81\x94}\x94(\x8c\x08tb_frame\x94h\x10)\x81\x94}\x94(h\x13}\x94h\x15}\x94(h\x17\x8c\x1edistributed.protocol.serialize\x94h\x19\x8cI/opt/conda/lib/python3.12/site-packages/distributed/protocol/serialize.py\x94uh\x1bh\x1d)\x81\x94}\x94(h h5h!\x8c\x0bdeserialize\x94h#K\x00h$K\x00h%)h&K\x00h\'K\x00h(K@h)K\x00ubh*M\xc4\x01ub\x8c\ttb_lineno\x94M\xc4\x01\x8c\x07tb_next\x94h,)\x81\x94}\x94(h/h\x10)\x81\x94}\x94(h\x13}\x94h\x15}\x94(h\x17h4h\x19h5uh\x1bh\x1d)\x81\x94}\x94(h h5h!\x8c\x0cpickle_loads\x94h#K\x00h$K\x00h%)h&K\x00h\'K\x00h(K@h)K\x00ubh*Koubh9Koh:h,)\x81\x94}\x94(h/h\x10)\x81\x94}\x94(h\x13}\x94h\x15}\x94(h\x17\x8c\x1bdistributed.protocol.pickle\x94h\x19\x8cF/opt/conda/lib/python3.12/site-packages/distributed/protocol/pickle.py\x94uh\x1bh\x1d)\x81\x94}\x94(h hKh!\x8c\x05loads\x94h#K\x00h$K\x00h%)h&K\x00h\'K\x00h(K@h)K\x00ubh*Kcubh9K^ububub\x87\x94R\x94N\x89Nt\x94R\x94}\x94\x8c\x04name\x94\x8c\x0faveraging_utils\x94sbh\rh\x10)\x81\x94}\x94(h\x13}\x94h\x15}\x94(h\x17h\x18h\x19h\x1auh\x1bh\x1d)\x81\x94}\x94(h h\x1ah!h"h#K\x00h$K\x00h%)h&K\x00h\'K\x00h(K@h)K\x00ubh*M\xc9\x12ubM\x9b\x12N\x87\x94R\x94hR\x88Nt\x94R\x94.'
Traceback (most recent call last):
  File "/home/ssm-user/anaconda3/envs/dm/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 96, in loads
    return pickle.loads(x)
           ^^^^^^^^^^^^^^^
TypeError: unpickle_exception() takes 4 positional arguments but 7 were given

Using client.get_versions(check=True) to check environment and package consistencies between the scheduler, workers, and client, this is the result:

{
    "scheduler": {
        "host": {
            "python": "3.12.4.final.0",
            "python-bits": 64,
            "OS": "Linux",
            "OS-release": "5.10.224-212.876.amzn2.x86_64",
            "machine": "x86_64",
            "processor": "x86_64",
            "byteorder": "little",
            "LC_ALL": "C.UTF-8",
            "LANG": "C.UTF-8"
        },
        "packages": {
            "python": "3.12.4.final.0",
            "dask": "2024.7.1",
            "distributed": "2024.7.1",
            "msgpack": "1.0.8",
            "cloudpickle": "3.0.0",
            "tornado": "6.4.1",
            "toolz": "0.12.0",
            "numpy": "2.0.0",
            "pandas": "2.2.2",
            "lz4": "4.3.3"
        }
    },
    "workers": {
        "tcp://10.5.163.145:33391": {
            "host": {
                "python": "3.12.4.final.0",
                "python-bits": 64,
                "OS": "Linux",
                "OS-release": "5.10.224-212.876.amzn2.x86_64",
                "machine": "x86_64",
                "processor": "x86_64",
                "byteorder": "little",
                "LC_ALL": "C.UTF-8",
                "LANG": "C.UTF-8"
            },
            "packages": {
                "python": "3.12.4.final.0",
                "dask": "2024.7.1",
                "distributed": "2024.7.1",
                "msgpack": "1.0.8",
                "cloudpickle": "3.0.0",
                "tornado": "6.4.1",
                "toolz": "0.12.0",
                "numpy": "2.0.0",
                "pandas": "2.2.2",
                "lz4": "4.3.3"
            }
        },
        "tcp://10.5.167.121:45117": {
            "host": {
                "python": "3.12.4.final.0",
                "python-bits": 64,
                "OS": "Linux",
                "OS-release": "5.10.224-212.876.amzn2.x86_64",
                "machine": "x86_64",
                "processor": "x86_64",
                "byteorder": "little",
                "LC_ALL": "C.UTF-8",
                "LANG": "C.UTF-8"
            },
            "packages": {
                "python": "3.12.4.final.0",
                "dask": "2024.7.1",
                "distributed": "2024.7.1",
                "msgpack": "1.0.8",
                "cloudpickle": "3.0.0",
                "tornado": "6.4.1",
                "toolz": "0.12.0",
                "numpy": "2.0.0",
                "pandas": "2.2.2",
                "lz4": "4.3.3"
            }
        }
    },
    "client": {
        "host": {
            "python": "3.12.4.final.0",
            "python-bits": 64,
            "OS": "Linux",
            "OS-release": "6.1.97-104.177.amzn2023.x86_64",
            "machine": "x86_64",
            "processor": "x86_64",
            "byteorder": "little",
            "LC_ALL": "None",
            "LANG": "C.UTF-8"
        },
        "packages": {
            "python": "3.12.4.final.0",
            "dask": "2024.7.1",
            "distributed": "2024.7.1",
            "msgpack": "1.0.8",
            "cloudpickle": "3.0.0",
            "tornado": "6.4.1",
            "toolz": "0.12.0",
            "numpy": "2.0.0",
            "pandas": "2.2.2",
            "lz4": "4.3.3"
        }
    }
}

As you can see, only the OS-release is different: "5.10.224-212.876.amzn2.x86_64" for the scheduler/workers and "6.1.97-104.177.amzn2023.x86_64" for the client. Could this be the issue?

Upvotes: 0

Views: 23

Answers (0)

Related Questions