Reputation: 103
I am trying to write an ansible playbook to crawl a website and then store its contents into a static file under aws s3 bucket. Here is the crawler code :
"""
Handling pages with the Next button
"""
import sys
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
url = "https://xyz.co.uk/"
file_name = "web_content.txt"
while True:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
raw_html = soup.prettify()
file = open(file_name, 'wb')
print('Collecting the website contents')
file.write(raw_html.encode())
file.close()
print('Saved to %s' % file_name)
#print(type(raw_html))
# Finding next page
next_page_element = soup.select_one('li.next > a')
if next_page_element:
next_page_url = next_page_element.get('href')
url = urljoin(url, next_page_url)
else:
break
This is my ansible-playbook:
---
- name: create s3 bucket and upload static website content into it
hosts: localhost
connection: local
tasks:
- name: create a s3 bucket
amazon.aws.aws_s3:
bucket: testbucket393647914679149
region: ap-south-1
mode: create
- name: create a folder in the bucket
amazon.aws.aws_s3:
bucket: testbucket393647914679149
object: /my/directory/path
mode: create
- name: Upgrade pip
pip:
name: pip
version: 21.1.3
- name: install virtualenv via pip
pip:
requirements: /root/ansible/requirements.txt
virtualenv: /root/ansible/myvenv
virtualenv_python: python3.6
environment:
PATH: "{{ ansible_env.PATH }}:{{ ansible_user_dir }}/.local/bin"
- name: Run script to crawl the website
script: /root/ansible/beautiful_crawl.py
- name: copy file into bucket folder
amazon.aws.aws_s3:
bucket: testbucket393647914679149
object: /my/directory/path/web_content.text
src: web_content.text
mode: put
Problem is when I run this, it runs fine upto the task name: install virtualenv via pip and then throws following error while executing the task name: Run script to crawl the website:
fatal: [localhost]: FAILED! => {"changed": true, "msg": "non-zero return code", "rc": 2, "stderr": "/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-9798 3643645466/beautiful_crawl.py: line 1: import: command not found\n/root/.ansible /tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: lin e 2: from: command not found\n/root/.ansible/tmp/ansible-tmp-1625137700.8854306- 13026-97983643645466/beautiful_crawl.py: line 3: import: command not found\n/roo t/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_cra wl.py: line 4: from: command not found\n/root/.ansible/tmp/ansible-tmp-162513770 0.8854306-13026-97983643645466/beautiful_crawl.py: line 6: url: command not foun d\n/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beauti ful_crawl.py: line 7: file_name: command not found\n/root/.ansible/tmp/ansible-t mp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: line 10: syntax e rror near unexpected token
('\n/root/.ansible/tmp/ansible-tmp-1625137700.885430 6-13026-97983643645466/beautiful_crawl.py: line 10:
response = requests.get (url)'\n", "stderr_lines": ["/root/.ansible/tmp/ansible-tmp-1625137700.8854306-1 3026-97983643645466/beautiful_crawl.py: line 1: import: command not found", "/ro ot/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_cr awl.py: line 2: from: command not found", "/root/.ansible/tmp/ansible-tmp-162513 7700.8854306-13026-97983643645466/beautiful_crawl.py: line 3: import: command no t found", "/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-9798364364546 6/beautiful_crawl.py: line 4: from: command not found", "/root/.ansible/tmp/ansi ble-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: line 6: url: command not found", "/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97 983643645466/beautiful_crawl.py: line 7: file_name: command not found", "/root/. ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl. py: line 10: syntax error near unexpected token('", "/root/.ansible/tmp/ansibl e-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: line 10:
response = requests.get(url)'"], "stdout": "", "stdout_lines": []}
What am I doing wrong here?
Upvotes: 2
Views: 3064
Reputation: 4574
You have multiple problems.
Check the documentation.
No. 1: The script
modules will run bash
scripts by default, not python
scripts. If you want to run a python script, you need to add a shebang like #!/usr/bin/env python3
as the first line of the script or use the executable
parameter.
No 2: You create a venv
, so I assume you want to run the script in that venv
. You can't do that out of the box with the script
module, so you would need to work around that.
This should work for you (you don't need the shebang, as you tell the script module to run it with python in the venv using the executable
parameter):
- name: Run script to crawl the website
script: /root/ansible/beautiful_crawl.py
executable: /root/ansible/myvenv/bin/python
Upvotes: 1