rbutrnz
rbutrnz

Reputation: 393

How to grab data fields from a dynamic url using selenium in python

I am able to extract some data from a url but I am still missing some data.

import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
url = 'https://poocoin.app/tokens/0xe56842ed550ff2794f010738554db45e60730371'
driver.get(url)

time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'lxml')

data = soup.find('div', class_='overflow-auto unpad-3 ps-3').get_text()
print (data)

Current Output:

Pc v2 | BIN/BNB LP Holdings: 4,694.84 BNB ($2,221,326) | Chart | Holders
Pc v2 | BIN/BUSD LP Holdings: 0.03 BUSD ($0) | Chart | Holders
Pc v2 | BIN/USDT LP Holdings: 0.00 USDT ($0) | Chart | Holders

Wanted Output:

Pc v2 | BIN/BNB LP Holdings: 4,697.12 BNB ($2,226,112)
    | Chart     https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics
    | Holders   https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances
Pc v2 | BIN/BUSD LP Holdings: 0.03 BUSD ($0)
    | Chart     https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics
    | Holders   https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances
Pc v2 | BIN/USDT LP Holdings: 0.00 USDT ($0)
    | Chart     https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics
    | Holders   https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances

Upvotes: 0

Views: 105

Answers (2)

pmadhu
pmadhu

Reputation: 3433

Try this once:

soup = BeautifulSoup(driver.page_source,'html5lib')

rows = soup.find_all('div', class_='text-xs my-3')
for row in rows:
    data = row.get_text()
    chart = "Chart: {}".format(row.find('a',text=['Chart']).attrs['href'])
    holder = "Holders: {}".format(row.find('a',text=['Holders']).attrs['href'])
    print(data)
    print(chart)
    print(holder)

Output:

Pc v2 | BIN/BNB LP Holdings:4,708.86 BNB ($2,239,013) | Chart | Holders
Chart: https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics
Holders: https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances
Pc v2 | BIN/BUSD LP Holdings:0.03 BUSD ($0) | Chart | Holders
Chart: https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics
Holders: https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances
Pc v2 | BIN/USDT LP Holdings:0.00 USDT ($0) | Chart | Holders
Chart: https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics
Holders: https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances

Upvotes: 1

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

In one line output use find_all method on a tag and put text to get specific links

all_links=[ i['href'] for i in soup.find('div', class_='overflow-auto unpad-3 ps-3').find_all("a",text=['Chart','Holders'])]

Output:

['https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics',
 'https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances',
 'https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics',
 'https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances',
 'https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics',
 'https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances']

As per your requireent:

data=soup.find('div', class_='overflow-auto unpad-3 ps-3').find_all("div",class_="text-xs my-3")
for i in data:
    print(i.find("a",attrs={"target":"_blank"}).get_text(),end="")
    print(" ".join(i.find("a").find_next_siblings(text=True)[:2]),end="")
    print(i.find("span").get_text())
    links=[i.get_text() +" "+ i['href'] for i in i.find_all("a",text=['Chart','Holders'])]
    print(*links,sep="\n")
    

Output:

Pc v2 | BIN/BNB LP Holdings: 4,716.76 BNB ($2,234,449)
Chart https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics
Holders https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances
Pc v2 | BIN/BUSD LP Holdings: 0.03 BUSD ($0)
Chart https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics
Holders https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances
Pc v2 | BIN/USDT LP Holdings: 0.00 USDT ($0)
Chart https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics
Holders https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances

Upvotes: 2

Related Questions