Reputation: 551
Exception:
Failed with exception java.io.IOException:java.io.IOException: Somehow read -1 bytes trying to skip 6257 more bytes t o seek to position 6708, size: 1290047
Does anyone has any idea about how to fix it on cloud dataproc ?
Upvotes: 0
Views: 208
Reputation: 10687
It looks like you're probably hitting this known issue which is somewhat specific to reading ORC files. The GCS connector version 1.5.4 has the fix, and is rolling out in Dataproc this week (expected to be fully rolled out by this Friday, October 14th).
In the meantime, you can use a small initialization action to update the connector version on your dataproc clusters automatically; create a file named update-gcs-1.5.4.sh
:
#!/bin/bash
rm -f /usr/lib/hadoop/lib/gcs-connector*.jar
gsutil cp gs://hadoop-lib/gcs/gcs-connector-1.5.4-hadoop2.jar /usr/lib/hadoop/lib/
And then upload that file to GCS somewhere:
gsutil cp update-gcs-1.5.4.sh gs://<YOUR_BUCKET_HERE>/update-gcs-1.5.4.sh
Then create your Dataproc cluster:
gcloud dataproc clusters create \
--initialization-actions gs://<YOUR_BUCKET_HERE>/update-gcs-1.5.4.sh
Upvotes: 1