TheAmazingHAzza
TheAmazingHAzza

Reputation: 116

Scraping data- attributes from web page

I am needing some assistance on using python to scrape some data- attributes form a site. I have tried using lxml and requests with no luck and have looked online and I found some articles about using beautiful soup. The only problem is I am not sure how.

Here is what I would like to scrape.

<div class="card-body ">

<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>">
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>">
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div> 

I am trying to get the data-var5 value out but I have no idea how. Hope someone can help.

Regards,

Hazza

Upvotes: 2

Views: 908

Answers (2)

Humayun Ahmad Rajib
Humayun Ahmad Rajib

Reputation: 1560

you can use select. you can try it:

from bs4 import BeautifulSoup
html = """
<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div> 
"""

soup = BeautifulSoup(html, "lxml")
data_var = soup.select('div[data-var5]')

for data in data_var:
    print("data-var5: " + data['data-var5'])

Output will be:

data-var5: 9
data-var5: 7
data-var5: 3
data-var5: 9

Upvotes: 1

Artyom Vancyan
Artyom Vancyan

Reputation: 5390

from bs4 import BeautifulSoup

html = """
<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div> 
"""

soup = BeautifulSoup(html, "html.parser")
divs = soup.find_all("div", "card-entry")
for div in divs:
    print(div["data-var5"])

Upvotes: 1

Related Questions