Reputation: 65
I am trying to web scrape stock data using a for loop on a list of five stocks. The problem is only the first value is returned five times. I have tried appending to a list but it still doesn't work, although clearly I am not appending correctly. On the website, I want to get the data for Operating Cash which comes in the form of 14B or 1B for example, which is why I have removed the B and multiplied that number to get a raw value. Here is my code:
import requests
import yfinance as yf
import pandas as pd
from bs4 import BeautifulSoup
headers = {'User Agent':'Mozilla/5.0'}
stocks = ['AMC','AMD','PFE','AAPL', 'NVDA']
finished_list = []
for stock in stocks:
url = f'https://www.marketwatch.com/investing/stock/{stock}/financials/cash-flow'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
operating_cash = soup.findAll('div', class_ = "cell__content")[134].text
finished_list.append(operating_cash)
if 'B' in operating_cash:
cash1 = operating_cash.replace('B','')
if '(' in cash1:
cash2 = cash1.replace('(','-')
if ')' in cash2:
cash3 = cash2.replace(')','')
cash3 = float(cash3)
print(cash3*1000000000)
else:
cash1 = float(cash1)
print(cash1 * 1000000000)
The current output is -1060000000.0 five times in a row which is the correct value for operating cash for AMC but not for the other four. Thanks in advance to anyone who can help me out.
Upvotes: 1
Views: 225
Reputation: 23146
You don't need to use if
conditions for str.replace()
. Instead, do all your replacements in one line like so:
for stock in stocks:
url = f'https://www.marketwatch.com/investing/stock/{stock}/financials/cash-flow'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
operating_cash = soup.findAll('div', class_ = "cell__content")[134].text
finished_list.append(operating_cash)
cash = float(operating_cash.replace('B','').replace('(','-').replace(')',''))
print(cash*1000000000)
-1060000000.0
1070000000.0000001
14400000000.0
80670000000.0
5820000000.0
Upvotes: 1