Jcmoney1010
Jcmoney1010

Reputation: 922

Using Beautiful Soup to get span title attribute

I'm new to python and Beautiful soup, but I am working on a web scraper that will grab the data from this website :

http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742

The webpage is pretty simplistic, basically just a table, so I'm just trying to grab each field within the table. My issue is that for the first field, I'm trying to actually grab the date in the span title rather than the actual value that is displayed. I can grab a list of the span titles, or I can grab the other information from the other two fields, but I am unable to grab the span title and the other two fields at the same time. Heres an example of what I'm trying to accomplish:

2018-01-20 03:37:00
3.90135252
8ece3baba44382eec3d62fa76b5beba98ae398f81ad2d77556b95c3c1a739b4f

Instead, the best I'm able to do so far is

{'title': '2018-01-20 03:57:00'}
2h ago
{'title': '2018-01-20 03:57:00'}
3.90135252
{'title': '2018-01-20 03:57:00'}
8ece3baba44382eec3d62fa76b5beba98ae398f81ad2d77556b95c3c1a739b4f

This is close, but unfortunately it duplicates the title time, leaves the title tag in the output, and it actually just repeats that same date and time for every single record. What is the best way to achieve the results I'm looking for?

Here is my code

import requests
import time
from bs4 import BeautifulSoup

theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl, headers={'User-Agent':'MyAgent'})
soup = BeautifulSoup(thepage.text, "html.parser")


for table in soup.findAll('td'):
    print(table.text)
    for time in soup.findAll('span'):
        print(time.attrs)
        count =  1
        if count == 1:
            count ==0
            break

Upvotes: 0

Views: 3139

Answers (1)

Keyur Potdar
Keyur Potdar

Reputation: 7248

Try this for getting the values from all the rows:

for row in soup.find_all('tr', {'class': 'ssrow', 'style': None}):
    time = row.find('span')['title']
    amount = row.find('td', {'align': 'right'}).find('b').text
    tx = row.find('a').text
    # Print these values however you want.

To check the code for first row:

row = soup.find('tr', {'class': 'ssrow', 'style': None})
time = row.find('span')['title']
amount = row.find('td', {'align': 'right'}).find('b').text
tx = row.find('a').text
print(time, amount, tx)

Output:

2018-01-20 06:56:43 4.42507599 d142445fd36e6a141a18071110faa8f6f3f9f8a42de888a149d8aa9416fe83ce

Explanation:

All the rows are included in the <tr> tag, but the first <tr> tag is for the heading. To filter that out, I've added the attribute 'class': 'ssrow' as all other rows have that attribute. But if you can see the last row it's the total with its <tr> tag containing style="border-top: 2px solid #eee;". To filter that out, I've added 'style': None.

Upvotes: 2

Related Questions