ptsv
ptsv

Reputation: 23

Scraping multiple pages in Steam with BeautifulSoup

My goal is to scrape Action games' information, such as name of game, tags, prices. Used libraries are requests, beautifulsoup. URL : https://store.steampowered.com/tags/en/Action/#p=0&tab=ConcurrentUsers

I managed to code it up for the first page and then I tried to scrape 15 pages. My plan was that when I replace the "/Action/#p=0" with "/Action/#p=1" in the url and send a get request, I would receive the html response with the games from next page. For some reason this did not work as even if I try with "#p=15", I get the html for the first page. Then I inspected the page elements (1,2,3,4..) but they do not contain any links. Next, I started looking in "Inspect > Network tab" to check if I can intercept any link that resembles the html of the next page and I found it - upon inspection it did contain the games from the next page. URL for second page : https://store.steampowered.com/contenthub/querypaginated/tags/ConcurrentUsers/render/?query=&start=15&count=15&cc=BG&l=english&v=4&tag=Action&tagid=19

The page number 2 in the URL where the number is the "=&start" value/15. Unfortunately, the content is unusable as the hierarchies of the tags are messed up. For example:

           <span class="top_tag">
            FPS
           </span>
           <span class="top_tag">
            , Shooter
           </span>

Would be:

       <span class='\"top_tag\"'>
        FPS&lt;\/span&gt;
        <span class='\"top_tag\"'>
         , Shooter&lt;\/span&gt;

The second span class is the child of the first, where it should be its sibling. Both examples are decoded using prettify soup method with utf-8.

Is there a better way to do this? I am aware I can do it using regex or selenium, but I wonder if there is a way to do this task with beautifulsoup and requests.

Upvotes: 1

Views: 691

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

The content that the server responds is in Json format, so use .json() method to parse it. For example:

import requests
from bs4 import BeautifulSoup

url = "https://store.steampowered.com/contenthub/querypaginated/tags/ConcurrentUsers/render/"

params = {
    "query": "",
    "start": 0,
    "count": 15,
    "cc": "US",
    "l": "english",
    "v": "4",
    "tag": "Action",
    "tagid": "19",
}


for page in range(5):  # <-- increase number of pages here
    params["start"] = 15 * page
    data = requests.get(url, params=params).json()
    soup = BeautifulSoup(data["results_html"], "html.parser")
    for item in soup.select(".tab_item_content"):
        print(
            "{:<40} {}".format(
                item.select_one(".tab_item_name").text,
                item.select_one(".tab_item_top_tags").text,
            )
        )

Prints:

Counter-Strike: Global Offensive         FPS, Shooter, Multiplayer, Competitive
Grand Theft Auto V                       Open World, Action, Multiplayer, Automobile Sim
Lost Ark                                 MMORPG, Free to Play, Action RPG, Hack and Slash
Apex Legends™                            Free to Play, Battle Royale, Multiplayer, Shooter
PUBG: BATTLEGROUNDS                      Survival, Shooter, Multiplayer, Battle Royale
Dota 2                                   Free to Play, MOBA, Multiplayer, Strategy
ELDEN RING                               Souls-like, Relaxing, Dark Fantasy, RPG
Tom Clancy's Rainbow Six® Siege          FPS, Hero Shooter, Multiplayer, Tactical
Vampire Survivors                        Action Roguelike, Pixel Graphics, Bullet Hell, Casual
NARAKA: BLADEPOINT                       Battle Royale, Sexual Content, Multiplayer, Martial Arts
Warframe                                 Free to Play, Action RPG, RPG, Action
Destiny 2                                Free to Play, Open World, Looter Shooter, FPS
Wallpaper Engine                         Mature, Utilities, Software, Anime
Rust                                     Survival, Crafting, Multiplayer, Open World
Dead by Daylight                         Horror, Survival Horror, Multiplayer, Online Co-Op
Brawlhalla                               Free to Play, Multiplayer, Fighting, Casual
Dread Hunger                             Multiplayer, Survival, Online Co-Op, Social Deduction
Stumble Guys                             Action, Casual, 3D, 3D Platformer
ARK: Survival Evolved                    Open World Survival Craft, Survival, Open World, Multiplayer
LEGO® Star Wars™: The Skywalker Saga     LEGO, Adventure, Open World, Multiplayer

...and so on.

Upvotes: 0

Related Questions