Reputation: 179
I have a small application that uses fast-api and playwright to scrape data and send it back to the client. The program is working properly when I'm running it locally, but when I try to run it as a Docker image it fails with the following error:
Looks like you launched a headed browser without having a XServer running.
Set either 'headless: true' or use 'xvfb-run <your-playwright-app>' before running Playwright.
obviously I tried running it in Headless=True mode, but the code fails with this error:
net::ERR_EMPTY_RESPONSE at https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true
logs
navigating to \"https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true\",
waiting until \"load\"
I also tried to run it locally with Headless=True and it failed with "Timeout 30000ms exceeded" error.
This is the funcion I'm using to return the page html:
def extract_html(self):
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto('https://book.flygofirst.com/Flight/Select?inl={}&CHD={}&s=True&o1={}&d1={}&ADT={}&dd1={}&gl=0&glo=0&cc=INR&mon=true'.format(self.infants, self.children , self.origin, self.destination, self.adults, self.date))
html = page.inner_html('#sectionBody')
return html
and this is my Dockerfile:
FROM python:3.9-slim
COPY ../../requirements/dev.txt ./
RUN python3 -m ensurepip
RUN pip install -r dev.txt
RUN playwright install
RUN playwright install-deps
ENV PYTHONPATH "${PYTHONPATH}:/app/"
WORKDIR /code/src
COPY ./src /app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
Hope someone could figure out what I'm doing wrong.
Upvotes: 2
Views: 4591
Reputation: 306
Locally it works as there's GUI stuff for sure already installed in order to open a browser (especially with headless=False
)
but when you're trying to put it to Docker env additional actions required, so I've resolved it in this way:
Dockerfile:
FROM mcr.microsoft.com/playwright/python:v1.{lastest_stable_version}-focal # in my case `30.0`
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y xvfb
RUN apt-get install -qqy x11-apps
# chromium dependencies
RUN apt-get install -y libnss3 \
libxss1 \
libasound2 \
fonts-noto-color-emoji
# additional actions related to your project
ENTRYPOINT ["/bin/sh", "-c", "/usr/bin/xvfb-run -a $@", ""] # exactly this kind of magic command :)
docker-compose.yml
service_name:
build: .
init: true
command: # command depending on a project
environment:
- DISPLAY=:0
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
Hope it will help
Upvotes: 2
Reputation: 1654
After investigating and trying several things, looks like the problem is the user_agent of the browser when is in headless mode, for some reason the default user agent does not like to that page, try with:
def extract_html(self):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36')
page.goto('http://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true')
html = page.inner_html('#sectionBody')
return html
Upvotes: 3