Daniel Ferber
Daniel Ferber

Reputation: 311

Mrjob fails to create cluster on dataproc: __init__() got an unexpected keyword argument 'channel'

I am trying to run the Hadoop Map Reduce word count example on Google Cloud Dataproc, using the Python mrjob library. However, mrjob fails with following exception:

TypeError: __init__() got an unexpected keyword argument 'channel'
Traceback (most recent call last):
  File "freq.py", line 21, in <module>
    MRWordFreqCount.run()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py", line 616, in run
    cls().execute()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py", line 687, in execute
    self.run_job()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py", line 636, in run_job
    runner.run()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/runner.py", line 503, in run
    self._run()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 468, in _run
    self._launch()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 473, in _launch
    self._launch_cluster()
  File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 637, in _launch_cluster
    self._get_cluster(self._cluster_id)
  File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 1188, in _get_cluster
    return self.cluster_client.get_cluster(
  File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 376, in cluster_client
    return google.cloud.dataproc_v1beta2.ClusterControllerClient(
TypeError: __init__() got an unexpected keyword argument 'channel'

I checked that GOOGLE_APPLICATION_CREDENTIALS was set correctly, APIs are enabled on Google Cloud and all required roles were set for the service account.

mrjob succeeds to upload the files to the Google Cloud Storage. But fails as soon it tries to create a new Dataproc cluster.

What could be possible wrong?

Command line to launch a mrjob on Dataproc:

$ python3 freq.py -r dataproc words.txt

Current Python environment:

$ python3 -VV
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]

$ pip3 list | grep google
google-api-core          1.23.0
google-auth              1.23.0
google-auth-oauthlib     0.4.2
google-cloud-core        1.4.3
google-cloud-dataproc    2.0.2
google-cloud-logging     1.15.1
google-cloud-storage     1.32.0
google-crc32c            1.0.0
google-pasta             0.2.0
google-resumable-media   1.1.0
googleapis-common-protos 1.52.0

$ pip3 list | grep mrjob
mrjob                    0.7.4

Upvotes: 2

Views: 484

Answers (1)

Daniel Ferber
Daniel Ferber

Reputation: 311

The solution was to downgrade google-cloud-dataproc to 1.1.1.

After debugging into mrjob implementation, I discovered that mrjob version 0.7.4 calls the constructor of google.cloud.dataproc_v1beta2.ClusterControllerClient using an argument that was renamed on google-cloud-dataproc library since version 2.0.0.

How to downgrade with pip3:

$ pip3 install --force-reinstall --no-deps google-cloud-dataproc==1.1.1

Upvotes: 2

Related Questions