Reputation: 196
Summarize the problem:
The Python package basically opens PDFs in batch folder, reads the first page of each PDF, matches keywords, and dumps compatible PDFs in source folder for OCR scripts to kick in. The first script to take all PDFs are MainBankClass.py. I am trying to use a docker-compose file to include all these python scripts under the same network and volume so that each OCR script starts to scan bank statements when the pre-processing is done. This link is the closest so far to accomplish the goal but it seems that I missed some parts of it. The process to call different OCR scripts is achieved by runpy.run_path(path_name='ChaseOCR.py')
, thus these scripts are in the same directory of __init__.py
. Here is the filesystem structure:
BankStatements
┣ BankofAmericaOCR
┃ ┣ BancAmericaOCR.py
┃ ┗ Dockerfile.bankofamerica
┣ ChaseBankStatementOCR
┃ ┣ ChaseOCR.py
┃ ┗ Dockerfile.chase
┣ WellsFargoStatementOCR
┃ ┣ Dockerfile.wellsfargo
┃ ┗ WellsFargoOCR.py
┣ BancAmericaOCR.py
┣ ChaseOCR.py
┣ Dockerfile
┣ WellsFargoOCR.py
┣ __init__.py
┗ docker-compose.yml
What I've tried so far:
In docker-compose.yml:
version: '3'
services:
mainbankclass_container:
build:
context: '.'
dockerfile: Dockerfile
volumes:
- /Users:/Users
#links:
# - "chase_container"
# - "wellsfargo_container"
# - "bankofamerica_container"
chase_container:
build: .
working_dir: /app/ChaseBankStatementOCR
command: ./ChaseOCR.py
volumes:
- /Users:/Users
bankofamerica_container:
build: .
working_dir: /app/BankofAmericaOCR
command: ./BancAmericaOCR.py
volumes:
- /Users:/Users
wellsfargo_container:
build: .
working_dir: /app/WellsFargoStatementOCR
command: ./WellsFargoOCR.py
volumes:
- /Users:/Users
And each dockerfile under each bank folder is similar except CMD
would be changed accordingly. For example, in ChaseBankStatementOCR folder:
FROM python:3.7-stretch
WORKDIR /app
COPY . /app
CMD ["python3", "ChaseOCR.py"] <---- changes are made here for the other two bank scripts
The last element is for Dockerfile outside of each folder:
FROM python:3.7-stretch
WORKDIR /app
COPY ./requirements.txt ./
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
RUN pip3 install --upgrade PyMuPDF
COPY . /app
COPY ./ChaseOCR.py /app
COPY ./BancAmericaOCR.py /app
COPY ./WellsFargoOCR.py /app
EXPOSE 8080
CMD ["python3", "MainBankClass.py"]
After running docker-compose build
, containers and network are successfully built. Error occurs when I run docker run -v /Users:/Users: python3 python3 ~/BankStatementsDemoOCR/BankStatements/MainBankClass.py
and the error message is FileNotFoundError: [Errno 2] No such file or directory: 'BancAmericaOCR.py'
I am assuming that the container doesn't have BancAmericaOCR.py but I have composed each .py file under the same network and I don't think links
is a good practice since docker recommended to use networks
here. What am I missing here? Any help is much appreciated. Thanks in advance.
Upvotes: 0
Views: 4102
Reputation: 196
So after days of searching regarding my case, I am closing this thread with an implementation of single application in a single container suggested on this link from docker forum. Instead of going with docker-compose, the suggested approach is to use 1 container with dockerfile for this application and it's working as expected.
On top of the dockerfile, we also need networks for different py files to communicate. For example:
docker network create my_net
docker run -it --network my_net -v /Users:/Users --rm my_awesome_app
EDIT: No network is needed since we are only running one container.
EDIT 2: Please see the accepted answer for future reference
Any answers are welcomed if anyone has better ideas on the case.
Upvotes: 0
Reputation: 191733
single application in a single container ... need networks for different py files to communicate
You only have one container. Docker networks are for multiple containers to talk to one another. And Docker Compose has a default bridge network defined for all services, so you shouldn't need that if you were still using docker-compose
Here's a cleaned up Dockerfile with all the scripts copied in, with the addition of an entrypoint file
FROM python:3.7-stretch
WORKDIR /app
COPY ./requirements.txt ./
RUN pip3 install --upgrade pip PyMuPDF && pip3 install -r requirements.txt
COPY . /app
COPY ./docker-entrypoint.sh /
ENTRYPOINT /docker-entrypoint.sh
In your entrypoint, you can loop over every file
#!/bin/bash
for b in Chase WellsFargo BofA ; do
python3 /app/$b.py
done
exec python3 /app/MainBankClass.py
Upvotes: 1