user7626267
user7626267

Reputation:

Web scraping and python data types

Web Scraping script using BeautifulSoup4 and Python 3.0 I want to remove the $ sign (in result) from price value, make it of type float and perform some numeric operation on it. But it is in text.

import requests
from bs4 import BeautifulSoup

def bitcoin_scheduler():
    url = "https://coinmarketcap.com/currencies/bitcoin/"
    r = requests.get(url)
    offline_data = r.content
    soup = BeautifulSoup(offline_data, 'html.parser')

    name_box = soup.find('small', attrs={'class': 'bold hidden-xs'})
    name = name_box.text.strip()

    price_box = soup.find('span', attrs={'class': 'text-large'})
    price = price_box.text.strip()

    print(time.ctime(), name, price)
    threading.Timer(5.0, bitcoin_scheduler).start()

bitcoin_scheduler()

Result:

Wed Nov 15 16:37:20 2017 (BTC) $6962.29
Wed Nov 15 16:37:25 2017 (BTC) $6962.29
Wed Nov 15 16:37:31 2017 (BTC) $6962.29
Wed Nov 15 16:37:36 2017 (BTC) $6962.29

Upvotes: 0

Views: 523

Answers (4)

Aaditya Ura
Aaditya Ura

Reputation: 12679

You can check with isdigit() but default isdigit() method only works for int not for float so you can define your own isdigit() which will work for both:

import requests
from bs4 import BeautifulSoup
import time
import threading

new=[]

def isdigit(d):
    try:
        float(d)
        return True
    except ValueError:
        return False

def bitcoin_scheduler():
    url = "https://coinmarketcap.com/currencies/bitcoin/"
    r = requests.get(url)
    offline_data = r.content
    soup = BeautifulSoup(offline_data, 'html.parser')

    name_box = soup.find('small', attrs={'class': 'bold hidden-xs'})
    name = name_box.text.strip()

    price_box = soup.find('span', attrs={'class': 'text-large'})
    price = price_box.text.strip('$')
    if isdigit(price)==True:
        price=float(price)
        #do your stuff with price
        print(time.ctime(), name,price)
        print(type(price))


    threading.Timer(5.0, bitcoin_scheduler).start()

bitcoin_scheduler()

output:

Wed Nov 15 17:07:22 2017 (BTC) 7003.54
<class 'float'>
Wed Nov 15 17:07:54 2017 (BTC) 7003.54
<class 'float'>

Upvotes: 1

Lakshmikant Deshpande
Lakshmikant Deshpande

Reputation: 844

Here's a simple example:

temp = "$6962.29"
temp = temp.strip("$")  # Removes $ from both sides
temp = float(temp)      # Converts to float
temp += 2               # Adding 2
print(temp)

It should give 6264.29 as output, because we've added 2 to the number.

Upvotes: 1

AidanH
AidanH

Reputation: 522

If your price is in the format "$100.00", then to remove the dollar symbol you can simply do:

price = price[1:]

This would make "$100.00" into "100.00" - It strips the first character off the string.

To convert to a float:

price = float(price)

Altogether it would simply be:

price = float(price[1:])

It may be worth performing some error checking on top of that.

Upvotes: 0

Sagun Shrestha
Sagun Shrestha

Reputation: 1198

Use the replace() method, alternatively use the strip() method

import requests
from bs4 import BeautifulSoup

def bitcoin_scheduler():
    url = "https://coinmarketcap.com/currencies/bitcoin/"
    r = requests.get(url)
    offline_data = r.content
    soup = BeautifulSoup(offline_data, 'html.parser')

    name_box = soup.find('small', attrs={'class': 'bold hidden-xs'})
    name = name_box.text.strip()

    price_box = soup.find('span', attrs={'class': 'text-large'})
    price = price_box.text.strip()

    print(time.ctime(), name, price.replace('$',''))
    threading.Timer(5.0, bitcoin_scheduler).start()

bitcoin_scheduler()

Upvotes: 0

Related Questions