Reputation: 31
My application (clads) runs on Django and uses Celery for timed and asynch tasks. Unfortunately, I can’t seem to figure out some permission issues that are preventing the Celery processes from writing to the Django application logs or from manipulating files created by the Django application. The Django application runs in the wsgi process and I have some configuration files that set up the application log directory so that the wsgi process can write to it (see below).
However, it seems the celery processes are running as a different user that doesn’t have permission to write to these files (which it automatically tries to do when it sees the log file configuration – also below. Note I tried to change this to run as wsgi but didn’t work). This same permissions issue seems to be preventing the Celery process from manipulating temporary files created by the Django application – a requirement of the project.
I’m admittedly very rusty on Unix type OSes so I’m sure I’m missing some simple thing. I’ve been searching this site and others on and off for a few days and while I have found many posts that have gotten me close to the issue, I still can’t seem to solve it. I suspect there may be some additional commands I need in my config to set permissions or run Celery under a different user. Any help would be greatly appreciated. The project configuration and pertinent code files are excerpted below. Most of the configuration files were cobbled together from information found on this and other sites – sorry for not siting but didn’t keep close enough records to know exactly where they came from.
Log and Celery portions of settings.py
#log settings
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '%(asctime)s - %(levelname)s - %(module)s.%(fileName)s.%(funcName)s %(processName)d %(threadName)d: %(message)s',
},
'simple': {
'format': '%(asctime)s - %(levelname)s: %(message)s'
},
},
'handlers' : {
'django_log_file': {
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'class': 'logging.FileHandler',
'filename': os.environ.get('DJANGO_LOG_FILE'),
'formatter': 'verbose',
},
'app_log_file': {
'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
'class': 'logging.FileHandler',
'filename': os.environ.get('CLADS_LOG_FILE'),
'formatter': 'verbose',
},
},
'loggers': {
'django': {
'handlers': ['django_log_file'],
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'propagate': True,
},
'clads': {
'handlers': ['app_log_file'],
'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
'propagate': True,
},
},
}
WSGI_APPLICATION = 'clads.wsgi.application'
# celery settings
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_SEND_EVENTS = False
CELERY_BROKER_URL = os.environ.get('BROKER_URL')
tasks.py excerpts LOGGER = logging.getLogger('clads.pit')
@shared_task(name="archive_pit_file")
def archive_pit_file(tfile_name):
LOGGER.debug('archive_date_file called for ' + tfile_name)
LOGGER.debug('connecting to S3 ...')
s3 = boto3.client('s3')
file_fname = os.path.join(settings.TEMP_FOLDER, tfile_name)
LOGGER.debug('reading temp file from ' + file_fname)
s3.upload_file(file_fname, settings.S3_ARCHIVE, tfile_name)
LOGGER.debug('cleaning up temp files ...')
#THIS LINE CAUSES PROBLEMS BECAUSE THE CELERY PROCESS DOES'T HAVE
#PERMISSION TO REMOVE TEH WSGI OWNED FILE
os.remove(file_fname)
logging.config
commands:
01_change_permissions:
command: chmod g+s /opt/python/log
02_change_owner:
command: chown root:wsgi /opt/python/log
99_celery.config
container_commands:
04_celery_tasks:
command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
05_celery_tasks_run:
command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
celery_configuration.txt
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A clads -b <broker_url> --loglevel=INFO --without-gossip --without-mingle --without-heartbeat
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv
[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A clads -b <broker_url> --loglevel=INFO --workdir=/tmp
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
Upvotes: 2
Views: 683
Reputation: 31
I wasn't able to figure out the permissions issue exactly but found a workaround that might help others. I removed the FileHandler configurations in the log settings and replaced these with a StreamHandler. This got around the permissions issue since the Celery processes didn't have to attempt access to a log file owned by the wsgi user.
The log messages from the web app end up in the httpd error log - not ideal but at least I can find them and they are accessible through the elastic beanstalk console as well - and the Celery logs are written to celery-worker.log and celery-beat.log in /var/log. I can't access these through the console but can get to them by logging directly onto the instance. This isn't ideal either since these logs wont get rotated and will be lost if the instance is retired, but at least it got me going for the time being.
Here's the modified log settings that got it working this way:
#log settings
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '%(asctime)s - %(levelname)s - %(module)s.%(filename)s.%(funcName)s %(processName)s %(threadName)s: %(message)s',
},
'simple': {
'format': '%(asctime)s - %(levelname)s: %(message)s'
},
},
'handlers' : {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'verbose',
}
},
'loggers': {
'django': {
'handlers': ['console'],
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'propagate': True,
},
'clads': {
'handlers': ['console'],
'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
'propagate': True,
},
},
}
Upvotes: 1