Reputation: 48
I wrote a simple program which parses a game database which contains a separate page for every item. It iterates a list of names and parses a page for every item (246 pages in total). But the larger the number, the lower the chance it will work.
For example, if I iterate only 100 items, the code will compile in 9 seconds...but if I compile it again it will be done in 1,5 seconds. Sometimes it just freezes and my debugger doesn't even show me why. If I set 150+ names it will never work. It looks like I'm stuck in endless loop. With the magic of print(123) I discover it stops on responses = await asyncio.gather(*tasks)
, but I have no idea what the problem is
So, what's the reason for that? Did I do something wrong or is the site just ignoring my "DDOS attack"?
This is a very strange problem for me considering I saw programms doing tens of thousands of requests.
Much obliged.
import requests
import aiohttp
import asyncio
from bs4 import BeautifulSoup
url = 'https://www.thecycledb.com/items' # website url
data = ['Hammer', 'Shattergun', 'Advocate', 'S-576 PDW', 'Kinetic Arbiter', 'Bulldog', 'KOR-47'] # names of every item (short version)
list = []
def get_tasks(session): # creating a task for every item
tasks = []
for i in data:
tasks.append(asyncio.create_task(session.get(url[:-1]+'/'+i.lower().replace("'",'').replace(' ','-')))) #iterate every URL
return tasks
async def parse():
async with aiohttp.ClientSession() as session:
tasks = get_tasks(session) # Tasks
responses = await asyncio.gather(*tasks) # where the programm stops
for i in responses: # Parsing process itself, it works okay
item_soup = BeautifulSoup(await i.text(),'lxml')
try:
a = item_soup.find('h3',string='Shop Price')
list.append(int(a.next_element.next_element.text.replace(',','')))
except(AttributeError):
list.append(0)
print(list)
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) # keeps a RuntimeError away
asyncio.run(parse())
Output (as it should be):
[73000, 54000, 76000, 1200, 412000, 6400, 210000]
There is the entire data list if you want to try to compile this:
data = ['Hammer', 'Shattergun', 'Advocate', 'S-576 PDW', 'Kinetic Arbiter', 'Bulldog', 'KOR-47', 'Rusty K-28', 'Rusty B9 Trenchgun', 'KARMA-1', "KM-9 'Scrapper'", 'C-32 Bolt Action', 'Lacerator', 'Voltaic Brute', 'PKR Maelstrom', 'B9 Trenchgun', 'Basilisk', 'K-28', 'Binocular', 'Rusty S-576 PDW', 'Rusty AR-55 Autorifle', 'Scarab', 'Phasic Lancer', 'Manticore', 'ICA Guarantee', 'Gorgon', 'Heavy Mining Tool', 'KOMRAD', 'KBR Longshot', 'Asp Flechette Gun', 'Zeus Beam', 'Flashlight', 'Mineral Scanner', 'AR-55 Autorifle', 'Audio Decoy', 'Combat Stim', 'Weak Stim', 'Grenade', 'Combat Medkit', 'Strong Stim', 'Strong Medkit', 'Smoke Grenade', 'Gas Grenade', 'Weak Medkit', 'Large Backpack', 'Small Backpack', 'Heavy Duty Backpack', 'Worn Emergency Bag', 'Medium Backpack', 'Epic Helmet', 'Common Tactical Helmet', 'Uncommon Tactical Helmet', 'Common Helmet', 'Rare Helmet', 'Rare Tactical Helmet', 'Exotic Helmet', 'Rare Restoration Helmet', 'Legendary NV Helmet', 'Uncommon Helmet', 'Uncommon Restoration Helmet',
'Marauder Head', 'ICA Scrip', 'NiC Oil Cannister', '"Magic-GROW" Fertilizer', 'Heavy Strider Flesh', 'Savage Marauder Flesh', 'Resin Gun', 'Smart Mesh', 'Heavy Strider Head', 'Crusher Hide', 'Mature Rattler Head', 'Crusher Flesh', 'Pure Focus Crystal', '"Fusion Cartridge" Batteries ', 'Salvaged Insulation', 'Rattler Skin', 'Autoloader', 'Spinal Base', 'Mature Rattler Eyes', 'Portable Lab', 'Shard Slicer', 'Copper Wire', 'Hardened Metals', 'Interactive Screen', 'Meteor Core', 'Electronic Cables', 'Shock Absorber', 'Meteor Fragment', 'Titan Ore', 'Clear Veltecite', 'Crusher Head', 'Miniature Reactor', 'Hydraulic Piston', 'Blue Runner Egg', 'Circuit Board', 'Marauder Flesh', 'Strider Head', 'Korolev Scrip', 'Nutritional Bar', 'Dustbloom', 'Letium Clot', 'Radio Equipment', 'Polymetallic Prefabricate', 'Biological Sampler', 'Old Medicine', 'Print Resin', 'Zero Systems CPU', 'Master Unit CPU', 'Ball Bearings', 'Altered Nickel', 'Veltecite Heart', 'Toxic Glands', 'Brittle Titan Ore', 'Nickel', 'Pure Veltecite', 'Aluminum scrap', 'Alpha Crusher Skull', 'Sample Container', 'Medical Supplies', 'Charged Spinal Base', 'Rattler Head', 'Optic Glass', 'Metallic Alloys', 'Old Currency', 'Brightcap Mushroom', 'Jewellery', 'Osiris Scrip', 'Pale Ivy Blossom', 'Waterweed Filament', 'Focus Crystal', 'Textiles', 'Indigenous Fruit', 'Rattler Eyes', 'Alpha Crusher Heart', 'Flawed Veltecite', 'Hardened Bone Plates', 'Strider Flesh', 'Glowy Brightcap Mushroom', 'Co-TEC MultiTool', 'Azure Tree Bark', 'Compound Sheets', 'Gyroscope', 'Savage Marauder Head', 'Magnetic Field Stabilizer', 'Derelict Explosives', 'Cloudy Veltecite', 'Ultralight Stock', 'Tactical Stock', 'Light Converter', '4x Optic', 'MKM Ultralight Stock', 'Small Suppressor', 'Medium Creature Dmg', 'Titan Ore Scanning Module', 'Standard Stock', 'Ergonomic Grip', 'Shotgun Slugs', 'Shotgun Quickdraw', 'Tactical Foregrip', 'Quickdraw Stock', 'Heavy Quickdraw', '8x Optic', 'Medium Muzzle Brake', 'Crude Oil Scanning Module', 'Focus Crystal Scanning Module', 'MKM Tactical Stock', 'Holographic Sight', 'Quickdraw Foregrip', 'Light Extended Quickdraw', 'Shotgun Extended', '2x Optic', 'Veltecite Scanning Module', 'Small Muzzle Brake', 'Medium Quickdraw', 'Angled Foregrip', 'Marksman Stock', 'Tactical Rear Grip', 'Medium Suppressor', 'Shotgun Converter', 'Medium Converter', 'Light Quickdraw', 'Medium Extended Quickdraw', 'Light Creature Dmg', 'Heavy Converter', 'Red Dot Sight', '6x Optic', 'MKM Quickdraw Stock', 'Medium Extended', 'Heavy Extended', 'Light Extended', 'Tactical Light', '2 - 4x Variable Optic', 'Common Shield', 'Uncommon Shield', 'Rare Tactical Shield', 'Rare Shield', 'Common Tactical Shield', 'Epic Shield', 'Uncommon Tactical Shield', 'Uncommon Restoration Shield', 'Rare Restoration Shield', 'Exotic Shield', 'Janitors Key', 'Armory Key', 'Loose House Key', 'Lab Keycard', 'Bar Storage Key', 'Bright Sands Observation Room Key', 'Community Room', 'Overseers Office', 'Tall House Key', 'Mine Access Key', "Boss' Office", 'Garage Office', 'Server Access Key', 'Luggage Saferoom Key', 'Skeleton Key', 'Letium Bio Samples', 'Uncommon Data Drive', 'Notes on Meteor Experiment - 1', 'Letium Coated Helmet', 'Miner Cam #D027', 'Old Bones', 'Data Drive', 'Notes on Meteor Experiment - 2', 'Oil Pump Part', 'Warden Skull', 'Miner Cam #2F53', 'Orbital Cannon Beacon', 'Rare Data Drive', 'Unique Data Drive', 'Miner Cam #A45D', 'Valuable Data Drive', '"Dig Site" Data Drive', 'Flight Recorder', 'Laser Drill Control Unit', 'Oil Pump Beacon', 'Laser Drill Beacon', 'Alpha Crusher Bait', 'Old Notebook', 'Sign of Life from Stranded Prospector', 'Medium ammo', 'Special ammo', 'Light ammo', 'Shotgun ammo', 'Heavy ammo']
#
Upvotes: 1
Views: 728
Reputation: 195613
Try to limit asyncio concurrency for example with asyncio.Semaphore
to not DDOS the server:
import aiohttp
import asyncio
from bs4 import BeautifulSoup
url = "https://www.thecycledb.com/items"
item_url = "https://www.thecycledb.com/item/{}"
data = [
"Hammer",
"Shattergun",
"Advocate",
"S-576 PDW",
"Kinetic Arbiter",
"Bulldog",
"KOR-47",
] # names of every item (short version)
lst = []
async def download(session, sem, u):
async with sem:
rv = await session.get(u)
print(f"Downloading {u} done")
return rv
def get_tasks(session, sem):
tasks = []
for i in data:
item_name = i.lower().replace("'", "").replace(" ", "-")
u = item_url.format(item_name)
tasks.append(download(session, sem, u))
return tasks
async def parse():
sem = asyncio.Semaphore(2) # <-- limit to max 2 parallel downloads
async with aiohttp.ClientSession() as session:
tasks = get_tasks(session, sem)
responses = await asyncio.gather(*tasks)
for i in responses: # Parsing process itself, it works okay
item_soup = BeautifulSoup(await i.text(), "lxml")
try:
a = item_soup.find("h3", string="Shop Price")
lst.append(
int(a.next_element.next_element.text.replace(",", ""))
)
except (AttributeError):
lst.append(0)
print(lst)
asyncio.run(parse())
Prints:
Downloading https://www.thecycledb.com/item/shattergun done
Downloading https://www.thecycledb.com/item/hammer done
Downloading https://www.thecycledb.com/item/advocate done
Downloading https://www.thecycledb.com/item/s-576-pdw done
Downloading https://www.thecycledb.com/item/kinetic-arbiter done
Downloading https://www.thecycledb.com/item/bulldog done
Downloading https://www.thecycledb.com/item/kor-47 done
[73000, 54000, 76000, 1200, 412000, 6400, 210000]
Upvotes: 2