Parsing out text without a tag

Question

I have been trying to parse out text without any tags. Wanted to build a little scraping tool for myself to help find good DND games to play on Roll20 (I was going to take this data and attach it to a table within each link for the final goal).

The URL I am parsing out info is here: Roll20 Link

I had an idea to try to parse out the text and then put each new line into a list of its own and grab the elements needed. I wanted to grab the info on the game, current players, and current open slots. Here is the code I have done so far. Any suggestions on what I might need to do to scrape this particular data?

Here is my code:

import requests
from bs4 import BeautifulSoup
import time

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36'}
url = r'https://app.roll20.net/lfg/search//?page=0&days=thursday,friday&dayhours=1652932800,1653019200&frequency=onceweekly,biweekly,monthly&timeofday=&timeofday_seconds=&language=English&avpref=Any&gametype=Any&newplayer=false&yesmaturecontent=false&nopaytoplay=false&playingstructured=dnd_next&sortby=relevance&for_event=&roll20con='
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.text, 'html.parser')

time.sleep(2)

games= soup.find_all('tr', {'class': 'lfglisting'})

game_urls = []

for item in games:
    # item_title = item.find('a', {'class': 'lfglistingname'}).text
    # item_url = 'https://app.roll20.net' + item.find('a', {'class': 'lfglistingname'})['href']
    current_players = item.get_text("
", strip=True)
    print(current_players)
    # try:
    #   item_game = item.find('strong', {'class': 'label label-success'}).text
    # except:
    #   item_game = 'Role-Playing Game'
    # try: 
    #   item_pay = item.find('strong', {'class': 'label label-danger'}).text
    # except:
    #   item_pay = 'Free to Play'
    # try:
    #   item_welcome = item.find('strong', {'class': 'label label-info'}).text
    # except:
    #   item_welcome = 'Experts Only'
    # print(f"Game: {item_title}. URL: {item_url}. Notes on Game: {item_game}, {item_pay}, {item_welcome}")
    # game_urls.append(item_url)

# print(game_urls)

Yarin_007 · Accepted Answer

I started off by looking at the source code of the page, and searching for a know string. (like part of a game description). it seems every description is inside a but, its parent element, the , is more intresting as it contains all the desired data. Notice all of these tags have something in common - the data-listingid attribute.

so let's get all of those.

for x in soup.select('tr[data-listingid]'):
    print(x.text.strip())

then, we start parsing, with regex.

import re

def print_data(dct):   
    for item, amount in dct.items():  
        print(f"{item} {'-'*(30 - len(item))} {amount}")


soup = BeautifulSoup(r.text, 'html.parser')

listings = soup.select('tr[data-listingid]')

listings_count = len(listings)

print (f"Expecting {listings_count} listings")
parsed_listings = []

for listing in listings:
    game = listing.text.strip()    
    try:
        name = re.search("\n{6}(.*)",game).group(1)
        info = re.search("\n{3} (.*)",game).groups(1)[0] + "..."
        curent_players = re.search("(.*) Current Players",game).groups(0)[0]
        open_slots = re.search("\((.*) Open Slots",game).groups(0)[0]
        game = {"Name": name, "Info": info, "Current_Players": curent_players, "Open_Slots": open_slots}
        parsed_listings.append(game)
        print_data(game)
        print ("\n=======\n")
    except Exception as e:        
        # print (e)
        pass

print (f"parsed {len(parsed_listings)} of {listings_count} total")

Gives:

Expecting 30 listings
Name -------------------------- Curse of Strahd - Grim Hollow/High RP
Info -------------------------- Take this opportunity to play the most popular D&D module ever made with an expert DM who cares about your backstory and wants to...
Current_Players --------------- 1
Open_Slots -------------------- 5

=======

Name -------------------------- The Dragon of Icespire Peak (Monday)
Info -------------------------- Dragon of Icespire Peak is the introductory adventure for the 5th Edition Starter Set, designed for PC levels 1 – 6. It is a...
Current_Players --------------- 1
Open_Slots -------------------- 6

=======

Name -------------------------- Necropolis
Info -------------------------- What ancient horrors lie slumbering in a newly discovered tomb deep in Egypt's Valley of the Kings? Are you allowing local superstitions and the...
Current_Players --------------- 1
Open_Slots -------------------- 4

=======

Name -------------------------- Weekly One-shots (Monday)
Info -------------------------- My car for my primary means of income (Uber) has died and I'm **urgently** trying to raise funds to replace it. If you'd like...
Current_Players --------------- 1
Open_Slots -------------------- 7

=======

Name -------------------------- dragonball z 
Info -------------------------- hello all those to whom love dragonball z! i have never DM before but i am willing to give it a chance. im trying...
Current_Players --------------- 1
Open_Slots -------------------- 3

=======

Name -------------------------- Weekly One-shots (Monday)
Info -------------------------- My car for my primary means of income (Uber) has died and I'm **urgently** trying to raise funds to replace it. If you'd like...
Current_Players --------------- 1
Open_Slots -------------------- 7

=======

Name -------------------------- Larula's Tomb
Info -------------------------- 3 Hour, Level 3 One Shot. Gritty, old school feel. Death possible. Backup characters provided. Roll 3d6 straight for stats. Roll for HP. The...
Current_Players --------------- 1
Open_Slots -------------------- 6

=======

Name -------------------------- Vast Stories of Erstonia
Info -------------------------- Vast Stories of Erstonia is a D&D 5e group devoted to playing a series of oneshots provided by the DM. The adventures will be...
Current_Players --------------- 1
Open_Slots -------------------- 4

=======

Name -------------------------- Beasts of Fortune 2
Info -------------------------- The Beasts of Fortune seeks adventures seeking fame, fortune, honor, or just a reason to smack some heads, come one come all to join...
Current_Players --------------- 1
Open_Slots -------------------- 20

=======
...
parsed 22 of 30 total

this is by no means a perfect solution, the parsing isn't perfect at all, but it should get you going.

Of course run this over each page # you want. (the /?page=0 in the url) If you want the full description of the listing, you're gonna have to GET it, specifically the Read More tag.

but then you can't use listing.text as it strips it away.

Also, this isn't legal advice or anything, but I wouldn't be surprised if this is against their site policy, so be wary.

Parsing out text without a tag

Answers (1)

Related Questions