Reputation: 57391
I'm running a Scrapy project with an Item Pipeline which writes data to a PostgreSQL database. It is a Docker Compose multi-container application with the following docker-compose.yml
:
version: '3'
services:
privoxy:
build: ./privoxy
links:
- tor
tor:
build:
context: ./tor
args:
password: "foo"
scraper:
build: ./scraper
environment:
- http_proxy=http://privoxy:8118
- PGPASSWORD=iperpassword
links:
- tor
- privoxy
- db
volumes:
- scraper-data:/scraper/apkmirror_scraper/data
- scraper-jobdir:/scraper/apkmirror_scraper/crawls
db:
image: postgres
environment:
- POSTGRES_PASSWORD=iperpassword
volumes:
- db-data:/var/lib/postgresql/data
volumes:
scraper-data:
scraper-jobdir:
db-data:
I'm using the postgres
image from the official repository (https://hub.docker.com/_/postgres/), and have made the /var/lib/postgresql/data
directory into a volume following https://docs.docker.com/compose/compose-file/#volume-configuration-reference.
The problem is that when I start the application, the db
service outputs logs and errors pertaining to things I tried previously, not what I'm trying to do now. This is what I see:
apkmirrorscrapercompose_tor_1 is up-to-date
apkmirrorscrapercompose_db_1 is up-to-date
apkmirrorscrapercompose_privoxy_1 is up-to-date
Recreating apkmirrorscrapercompose_scraper_1
Attaching to apkmirrorscrapercompose_tor_1, apkmirrorscrapercompose_db_1, apkmirrorscrapercompose_privoxy_1, apkmirrorscrapercompose_scraper_1
scraper_1 | List of databases
scraper_1 | Name | Owner | Encoding | Collate | Ctype | Access privileges
scraper_1 | -----------+----------+----------+------------+------------+-----------------------
scraper_1 | postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
scraper_1 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
scraper_1 | | | | | | postgres=CTc/postgres
scraper_1 | template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
scraper_1 | | | | | | postgres=CTc/postgres
scraper_1 | (3 rows)
scraper_1 |
scraper_1 | Postgres is up - executing command
scraper_1 | Tor appears to be working. Proceeding with command...
scraper_1 | 2017-06-30 12:32:59 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: apkmirror_scraper)
scraper_1 | 2017-06-30 12:32:59 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'apkmirror_scraper', 'NEWSPIDER_MODULE': 'apkmirror_scraper.spiders', 'SPIDER_MODULES': ['apkmirror_scraper.spiders']}
scraper_1 | 2017-06-30 12:32:59 [scrapy.middleware] INFO: Enabled extensions:
scraper_1 | ['scrapy.extensions.corestats.CoreStats',
scraper_1 | 'scrapy.extensions.telnet.TelnetConsole',
scraper_1 | 'scrapy.extensions.memusage.MemoryUsage',
scraper_1 | 'scrapy.extensions.closespider.CloseSpider',
scraper_1 | 'scrapy.extensions.logstats.LogStats']
db_1 | The files belonging to this database system will be owned by user "postgres".
db_1 | This user must also own the server process.
db_1 |
db_1 | The database cluster will be initialized with locale "en_US.utf8".
db_1 | The default database encoding has accordingly been set to "UTF8".
db_1 | The default text search configuration will be set to "english".
db_1 |
db_1 | Data page checksums are disabled.
db_1 |
db_1 | fixing permissions on existing directory /var/lib/postgresql/data ... ok
db_1 | creating subdirectories ... ok
db_1 | selecting default max_connections ... 100
db_1 | selecting default shared_buffers ... 128MB
db_1 | selecting dynamic shared memory implementation ... posix
db_1 | creating configuration files ... ok
db_1 | running bootstrap script ... ok
db_1 | performing post-bootstrap initialization ... ok
db_1 |
db_1 | WARNING: enabling "trust" authentication for local connections
db_1 | You can change this by editing pg_hba.conf or using the option -A, or
db_1 | --auth-local and --auth-host, the next time you run initdb.
db_1 | syncing data to disk ... ok
db_1 |
db_1 | Success. You can now start the database server using:
db_1 |
db_1 | pg_ctl -D /var/lib/postgresql/data -l logfile start
db_1 |
db_1 | waiting for server to start....LOG: database system was shut down at 2017-06-29 13:40:40 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | done
db_1 | server started
db_1 | ALTER ROLE
db_1 |
db_1 |
db_1 | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
db_1 |
db_1 | LOG: received fast shutdown request
db_1 | waiting for server to shut down....LOG: aborting any active transactions
db_1 | LOG: autovacuum launcher shutting down
db_1 | LOG: shutting down
db_1 | LOG: database system is shut down
db_1 | done
db_1 | server stopped
db_1 |
db_1 | PostgreSQL init process complete; ready for start up.
db_1 |
db_1 | LOG: database system was shut down at 2017-06-29 13:40:41 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | LOG: received smart shutdown request
db_1 | LOG: autovacuum launcher shutting down
db_1 | LOG: shutting down
db_1 | LOG: database system is shut down
db_1 | LOG: database system was shut down at 2017-06-29 13:41:53 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | LOG: received smart shutdown request
db_1 | LOG: autovacuum launcher shutting down
db_1 | LOG: shutting down
db_1 | LOG: database system is shut down
db_1 | LOG: database system was shut down at 2017-06-29 13:43:22 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | LOG: received smart shutdown request
db_1 | LOG: autovacuum launcher shutting down
db_1 | LOG: shutting down
db_1 | LOG: database system is shut down
db_1 | LOG: database system was shut down at 2017-06-29 13:52:02 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | ERROR: column "developer" of relation "apks" does not exist at character 31
db_1 | STATEMENT: INSERT INTO apks (url, title, developer, app, version_name, version_code, architectures, package, apk_file_size, android_min_version, android_target_version, supported_dpis, md5_signature, time_uploaded, time_scraped, image_urls, images, file_urls, files, created_on, updated_on) VALUES ('http://www.apkmirror.com/apk/sony-mobile-communications/backup-restore-unknown-developer/backup-restore-unknown-developer-1-2-a-1-23-release/backup-restore-1-2-a-1-23-android-apk-download/', 'Backup & restore 1.2.A.1.23', 'Sony Mobile Communications', 'Backup & restore', '1.2.A.1.23', '2360343', ARRAY['arm'], 'com.sonymobile.synchub', 9541452, '6.0', '7.1', ARRAY['nodpi'], 'eda39fedfc33ea939a5bb4926b3980c1', '2017-04-25T16:30:00+00:00'::timestamptz, '2017-06-29T14:10:39+00:00'::timestamptz, ARRAY['http://www.apkmirror.com/wp-content/themes/APKMirror/ap_resize/ap_resize.php?src=http%3A%2F%2Fwww.apkmirror.com%2Fwp-content%2Fuploads%2F2017%2F04%2F58ff36aea8e0c.png&w=96&h=96&q=100'], '{}', ARRAY['http://www.apkmirror.com/wp-content/themes/APKMirror/download.php?id=203068'], '{}', '2017-06-29T14:19:21.661193'::timestamp, '2017-06-29T14:19:21.661207'::timestamp) RETURNING apks.id
db_1 | LOG: received smart shutdown request
db_1 | LOG: autovacuum launcher shutting down
db_1 | LOG: shutting down
db_1 | LOG: database system is shut down
db_1 | LOG: database system was shut down at 2017-06-29 16:28:54 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
Although I'm not showing details on the application, my main point is that I'm positive that data shown in the error message is from a previous run, and not from the APK download page that I'm currently trying to scrape.
In short, it seems like PostgreSQL is 'caching' previous statements and trying to re-run them even after I stop the application, do docker-compose build
and docker-compose up
. It seems to me that this behavior must somehow be driven by the data in /var/lib/postgresql/data
, since this volume is the only thing that is persisted across restarting the container.
Is this indeed the case? If so, how can I disable this 'statement caching' behavior?
Upvotes: 2
Views: 908
Reputation: 36733
This just means that docker-compose is re-starting your previously stopped container (or re-attaching to it, if it is already running), and because of there is no need to re-build the image:
apkmirrorscrapercompose_db_1 is up-to-date
So, what your are seeing "again" in the logs is the docker logs
of that container which includes all the life-generated logs of postgres.
Simply remove the container and start again (the db data is persisted in the volume, so it will survive):
docker-compose rm db
Upvotes: 4