Permission problems with Django/Celery on Elastic Beanstalk

Question

My application (clads) runs on Django and uses Celery for timed and asynch tasks. Unfortunately, I can’t seem to figure out some permission issues that are preventing the Celery processes from writing to the Django application logs or from manipulating files created by the Django application. The Django application runs in the wsgi process and I have some configuration files that set up the application log directory so that the wsgi process can write to it (see below).

However, it seems the celery processes are running as a different user that doesn’t have permission to write to these files (which it automatically tries to do when it sees the log file configuration – also below. Note I tried to change this to run as wsgi but didn’t work). This same permissions issue seems to be preventing the Celery process from manipulating temporary files created by the Django application – a requirement of the project.

I’m admittedly very rusty on Unix type OSes so I’m sure I’m missing some simple thing. I’ve been searching this site and others on and off for a few days and while I have found many posts that have gotten me close to the issue, I still can’t seem to solve it. I suspect there may be some additional commands I need in my config to set permissions or run Celery under a different user. Any help would be greatly appreciated. The project configuration and pertinent code files are excerpted below. Most of the configuration files were cobbled together from information found on this and other sites – sorry for not siting but didn’t keep close enough records to know exactly where they came from.

Log and Celery portions of settings.py

#log settings
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
    'verbose': {
        'format': '%(asctime)s - %(levelname)s - %(module)s.%(fileName)s.%(funcName)s %(processName)d %(threadName)d: %(message)s',
    },
    'simple': {
        'format': '%(asctime)s - %(levelname)s: %(message)s'
    },
},
'handlers' : {
    'django_log_file': {
        'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        'class': 'logging.FileHandler',
        'filename': os.environ.get('DJANGO_LOG_FILE'),
        'formatter': 'verbose',
    },
    'app_log_file': {
        'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
        'class': 'logging.FileHandler',
        'filename': os.environ.get('CLADS_LOG_FILE'),
        'formatter': 'verbose',
    },
},
'loggers': {
    'django': {
        'handlers': ['django_log_file'],
        'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        'propagate': True,
    },
    'clads': {
        'handlers': ['app_log_file'],
        'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
        'propagate': True,
    },
},
}

WSGI_APPLICATION = 'clads.wsgi.application'

# celery settings
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_SEND_EVENTS = False

CELERY_BROKER_URL = os.environ.get('BROKER_URL')

tasks.py excerpts LOGGER = logging.getLogger('clads.pit')

@shared_task(name="archive_pit_file")
def archive_pit_file(tfile_name):
    LOGGER.debug('archive_date_file called for ' + tfile_name)

    LOGGER.debug('connecting to S3 ...')
    s3 = boto3.client('s3')

    file_fname = os.path.join(settings.TEMP_FOLDER, tfile_name)
    LOGGER.debug('reading temp file from ' + file_fname)
    s3.upload_file(file_fname, settings.S3_ARCHIVE, tfile_name)

    LOGGER.debug('cleaning up temp files ...')

    #THIS LINE CAUSES PROBLEMS BECAUSE THE CELERY PROCESS DOES'T HAVE 
    #PERMISSION TO REMOVE TEH WSGI OWNED FILE 
    os.remove(file_fname)

logging.config

commands:
  01_change_permissions:
      command: chmod g+s /opt/python/log
  02_change_owner:
      command: chown root:wsgi /opt/python/log

99_celery.config

container_commands:  
  04_celery_tasks:
    command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
    leader_only: true
 05_celery_tasks_run:
   command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
   leader_only: true

celery_configuration.txt

#!/usr/bin/env bash

# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '
' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}

# Create celery configuraiton script
celeryconf="[program:celeryd-worker]  
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A clads -b  --loglevel=INFO --without-gossip --without-mingle --without-heartbeat

directory=/opt/python/current/app  
user=nobody  
numprocs=1  
stdout_logfile=/var/log/celery-worker.log  
stderr_logfile=/var/log/celery-worker.log  
autostart=true  
autorestart=true  
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

environment=$celeryenv

[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A clads -b  --loglevel=INFO --workdir=/tmp

directory=/opt/python/current/app  
user=nobody  
numprocs=1  
stdout_logfile=/var/log/celery-beat.log  
stderr_logfile=/var/log/celery-beat.log  
autostart=true  
autorestart=true  
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

environment=$celeryenv"

# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf

# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf  
  then
  echo "[include]" | tee -a /opt/python/etc/supervisord.conf
  echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi

# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread

# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update

# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker  
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat

Permission problems with Django/Celery on Elastic Beanstalk

Answers (1)

Related Questions