Reputation: 311
I am trying to run the Hadoop Map Reduce word count example on Google Cloud Dataproc, using the Python mrjob
library. However, mrjob
fails with following exception:
TypeError: __init__() got an unexpected keyword argument 'channel'
Traceback (most recent call last):
File "freq.py", line 21, in <module>
MRWordFreqCount.run()
File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py", line 616, in run
cls().execute()
File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py", line 687, in execute
self.run_job()
File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py", line 636, in run_job
runner.run()
File "/usr/local/lib/python3.8/dist-packages/mrjob/runner.py", line 503, in run
self._run()
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 468, in _run
self._launch()
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 473, in _launch
self._launch_cluster()
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 637, in _launch_cluster
self._get_cluster(self._cluster_id)
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 1188, in _get_cluster
return self.cluster_client.get_cluster(
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py", line 376, in cluster_client
return google.cloud.dataproc_v1beta2.ClusterControllerClient(
TypeError: __init__() got an unexpected keyword argument 'channel'
I checked that GOOGLE_APPLICATION_CREDENTIALS
was set correctly, APIs are enabled on Google Cloud and all required roles were set for the service account.
mrjob
succeeds to upload the files to the Google Cloud Storage. But fails as soon it tries to create a new Dataproc cluster.
What could be possible wrong?
Command line to launch a mrjob
on Dataproc:
$ python3 freq.py -r dataproc words.txt
Current Python environment:
$ python3 -VV
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
$ pip3 list | grep google
google-api-core 1.23.0
google-auth 1.23.0
google-auth-oauthlib 0.4.2
google-cloud-core 1.4.3
google-cloud-dataproc 2.0.2
google-cloud-logging 1.15.1
google-cloud-storage 1.32.0
google-crc32c 1.0.0
google-pasta 0.2.0
google-resumable-media 1.1.0
googleapis-common-protos 1.52.0
$ pip3 list | grep mrjob
mrjob 0.7.4
Upvotes: 2
Views: 484
Reputation: 311
The solution was to downgrade google-cloud-dataproc to 1.1.1.
After debugging into mrjob implementation, I discovered that mrjob version 0.7.4 calls the constructor of google.cloud.dataproc_v1beta2.ClusterControllerClient using an argument that was renamed on google-cloud-dataproc library since version 2.0.0.
How to downgrade with pip3:
$ pip3 install --force-reinstall --no-deps google-cloud-dataproc==1.1.1
Upvotes: 2