Reputation: 2553
I want to mount a Google Cloud Bucket into my airflow environment so that I can read and write files on that GCS Bucket. I am using Cloud Composer 2 (composer-2.1.14-airflow-2.5.1 image)
In airflow I created a DAG to run the following bash script
#!/bin/bash
BUCKET="my-bucket"
MOUNT_DIR="/home/airflow/gcs/data/my-bucket"
#Creating $MOUNT_DIR directory & granting it permissions
mkdir -p $MOUNT_DIR
sudo chmod g+w $MOUNT_DIR
# Mounting GCS Bucket
gcsfuse --foreground --debug_fuse --debug_fs --debug_gcs --debug_http -o nonempty $BUCKET $MOUNT_DIR
Here are the logs from Airflow:
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Start gcsfuse/0.42.3 (Go version go1.19.5) for app "" using mount point: /home/airflow/gcs/data/my-bucket
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Opening GCS connection...
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Creating a mount at "/home/airflow/gcs/data/my-bucket"
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Creating a new server...
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Set up root directory for bucket my-bucket
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - gcs: Req 0x0: <- ListObjects("")
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - gcs: Req 0x0: -> ListObjects("") (131.395831ms): OK
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Mounting file system "my-bucket"...
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Beginning the mounting kickoff process
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Parsing fuse file descriptor
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Preparing for direct mounting
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Directmount failed. Trying fallback.
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Creating a socket pair
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Creating files to wrap the sockets
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - fuse_debug: Starting fusermount/os mount
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - /usr/bin/fusermount: fuse device not found, try 'modprobe fuse' first
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - Error while mounting gcsfuse: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1
[2023-09-20, 11:46:39 PDT] {subprocess.py:93} INFO - mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1
I already verified that Airflow can access the bucket by running the following command and I see the list of files in the bucket:
gsutil ls gs://$BUCKET
I even tried running the following command and I still get same error as above:
sudo mount -t gcsfuse -o rw,user $BUCKET $MOUNT_DIR
I have referenced the following and a few other pages but I am still not able to mount it:
Update: I updated the composer environment to composer-2.4.2-airflow-2.5.3 and I still see the following error:
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Start gcsfuse/1.0.1 (Go version go1.20.5) for app \"\" using mount point:/home/airflow/gcs/data/my-bucket\n","timestampSeconds":1695254138,"timestampNanos":83062812}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Opening GCS connection...\n","timestampSeconds":1695254138,"timestampNanos":83799366}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Creating a mount at \"/home/airflow/gcs/data/datavant/my-bucket\"\n","timestampSeconds":1695254138,"timestampNanos":87562370}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Creating a new server...\n","timestampSeconds":1695254138,"timestampNanos":87589651}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Set up root directory for bucket my-bucket\n","timestampSeconds":1695254138,"timestampNanos":87599362}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"gcs: Req 0x0: \u003c- ListObjects(\"\")\n","timestampSeconds":1695254138,"timestampNanos":87612220}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"gcs: Req 0x0: -\u003e ListObjects(\"\") (106.665835ms): OK\n","timestampSeconds":1695254138,"timestampNanos":194287578}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Mounting file system \"my-bucket\"...\n","timestampSeconds":1695254138,"timestampNanos":194342795}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Beginning the mounting kickoff process\n","timestampSeconds":1695254138,"timestampNanos":194916407}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Parsing fuse file descriptor\n","timestampSeconds":1695254138,"timestampNanos":194977401}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Preparing for direct mounting\n","timestampSeconds":1695254138,"timestampNanos":194984093}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Directmount failed. Trying fallback.\n","timestampSeconds":1695254138,"timestampNanos":195003380}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Creating a socket pair\n","timestampSeconds":1695254138,"timestampNanos":195238613}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Creating files to wrap the sockets\n","timestampSeconds":1695254138,"timestampNanos":195260643}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"DEBUG","severity":"DEBUG","message":"fuse_debug: Starting fusermount/os mount\n","timestampSeconds":1695254138,"timestampNanos":195270306}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - /usr/bin/fusermount: fuse device not found, try 'modprobe fuse' first
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - {"name":"root","levelname":"INFO","severity":"INFO","message":"Error while mounting gcsfuse: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1\n","timestampSeconds":1695254138,"timestampNanos":198067902}
[2023-09-20, 16:55:38 PDT] {subprocess.py:93} INFO - mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1
Upvotes: 0
Views: 914
Reputation: 2553
It is not possible to mount another bucket in Google Cloud Composer's Airflow environment. Confirmed this with Google support.
So workout for this was to copy the files I needed to the bucket where all the Airflow data (DAGS etcs) are and use that as the local filesystem.
Upvotes: 0
Reputation: 1
This issue is common for a fuse based system when the container is run in unprivileged mode.
See https://github.com/s3fs-fuse/s3fs-fuse/issues/647#issuecomment-330398877.
I was facing similar problem while mounting gcsfuse in a docker container. Running the container with the --privileged flag resolved the issue for me.
Therefore, it is possible that airflow is running the container in an unprivileged mode. If this is the case, the issue can be resolved by running the container with the --privileged flag.
Upvotes: 0
Reputation: 83
It seems like the issue is not from the gcsfuse side but the issue with installation with the fuse. Can you please try this solution https://forum.odroid.com/viewtopic.php?p=314535&sid=5decaed4623a9aa6c71619ac677d3bf2#p314535
Upvotes: 0