Reputation: 127
I'm trying to store some data that's scraped from a website. That urls are more than 100+ and similar each other. Because of that i tried to use something with %s tag in my code.
My e.g urls:
https://www.yahoo.com/lifestyle/tagged/food,
https://www.yahoo.com/lifestyle/tagged/sports,
https://www.yahoo.com/lifestyle/tagged/usa,
https://www.yahoo.com/lifestyle/tagged/health and goes on..
My Django+Bs4 Loop:
from django.core.management.base import BaseCommand
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from scraping.models import Job
import requests as req
header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}
class Command(BaseCommand):
def handle(self, *args, **options):
TAGS = ['economy', 'food', 'sports', 'usa', 'health']
resp = req.get('https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS),headers=header)
soup = BeautifulSoup(resp.text, 'lxml')
for i in range(len(soup)):
titles = soup.findAll("div", {"class": "StretchedBox Z(1)"})
print (titles)
Error message is:
TypeError: not all arguments converted during string formatting
I have been playing around with loops but am very new to this and am unable to work out how to loop it. What am I missing here? Can someone more knowledgeable point me in the right direction? Many thanks
Upvotes: 0
Views: 294
Reputation: 1817
You can loop through your tags to send a request for each tag.
header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}
TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
resp = requests.get(f"https://www.yahoo.com/lifestyle/tagged/{tag}", headers=header)
print(len(resp.text))
#341723
#442712
#447413
#368508
#445326
Upvotes: 1
Reputation: 23079
It appears that you want to insert each of the values in TAGS individually and perform a request for each of them. So you need to loop over TAGS and submit a request for each one. I expect that you want something like this:
TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
resp = req.get(f'https://www.yahoo.com/lifestyle/tagged/{tag}',headers=header)
soup = BeautifulSoup(resp.text, 'lxml')
<process the page>
Upvotes: 1