Reputation: 57
I successfully get the data from this table from THRIVEN :
But as you can see, at the Net%
column, those values negative/positive are determined by some CSS (which I believed, and I couldn't find them where they are located).
How can I extract those data and put them into my Excel as negative/positive numbers? Below is my current code :
lwb = load_workbook(filename='THRIVEN.xlsx')
lws = lwb['THRI']
klseLink = 'https://www.klsescreener.com/v2/stocks/view/7889'
klseParser = BeautifulSoup(klseLink.text, 'html.parser')
currentQuarterReportTable = klseParser.find('table', {'class': 'financial_reports table table-hover table-sm table-theme'}).findAll('tr', limit=5)
currentQuarterReportSelectedRow = []
print("")
print("==================== CURRENT QUARTER REPORT =====================")
print("")
try:
for currentQuarterReportRow in currentQuarterReportTable[1:]:
navigatedCurrentQuarterReportColumn = [td.text.strip() for td in currentQuarterReportRow.findAll("td")]
navigatedCurrentQuarterReportColumn.pop(0)
navigatedCurrentQuarterReportColumn.pop(0)
navigatedCurrentQuarterReportColumn.pop(0)
navigatedCurrentQuarterReportColumn.pop(4)
navigatedCurrentQuarterReportColumn.pop(6)
currentQuarterReportSelectedRow.append(navigatedCurrentQuarterReportColumn)
currentQuarterReportLimitedTable = pd.DataFrame(currentQuarterReportSelectedRow, columns=['Revenue', 'Profit/Loss', 'Quarter', 'Quarter Date', 'Announced Date', 'Net'])
currentQuarterReportLimitedTable = currentQuarterReportLimitedTable.rename(index={0: '1', 1: '2', 2: '3', 3: '4'})
print(currentQuarterReportLimitedTable)
i = 0
for currentQuarterReportRow in currentQuarterReportTable[1:]:
i += 1
selectedColumn = [td.text.strip() for td in currentQuarterReportRow.findAll("td")]
quarter = selectedColumn[5]
quarterDate = selectedColumn[6]
announcedDate = selectedColumn[8]
revenue = (selectedColumn[3].replace("k", "")).replace(",", "")
profitloss = (selectedColumn[4].replace("k", "")).replace(",", "")
net = selectedColumn[9].replace("%", "")
lws.cell(18 + int(i), 3).value = int(quarter)
lws.cell(18 + int(i), 5).value = quarterDate
lws.cell(18 + int(i), 7).value = announcedDate
lws.cell(18 + int(i), 9).value = int(revenue)
lws.cell(18 + int(i), 11).value = int(profitloss)
lws.cell(18 + int(i), 13).value = float(net)
except IndexError:
print("No Quarterly Report from KLScreener")
lwb.save('THRIVEN.xlsx')
Giving me :
Note that the Revenue
and Profit/Loss
colors are conditioned in Excel itself.
EDIT :
Finally I can achieve this by :
for currentQuarterReportRow in currentQuarterReportTable[1:]: #currentQuarterReportRow in currentQuarterReportTable[1:]:
currentQuarterReportRow = currentQuarterReportRow.find_all('td')[-2]
if currentQuarterReportRow.find('span', {'class':'btn-sm btn-danger'}):
print(float(currentQuarterReportRow.get_text().replace('%', '')) * -1)
else:
print(float(currentQuarterReportRow.get_text().replace('%', '')))
Thanks to @HedgeHog suggesting the solutions! :D
Upvotes: 0
Views: 39
Reputation: 25073
Check the class
of the button
to differentiate positive or negative value:
if net.select_one('.btn-danger'):
print(float(net.get_text().replace('%',''))*-1)
else:
print(float(net.get_text().replace('%','')))
Example
from bs4 import BeautifulSoup
html='''
<tr class="table-alternate">
<td class="number">-1.20</td>
<td class="number">0.000</td>
<td class="number">0.3400</td>
<td class="number">34,780k</td>
<td class="number">-6,537k</td>
<td class="text-center">4</td>
<td><span style="white-space: nowrap">2020-12-31</span></td>
<td><span style="white-space: nowrap">31 Dec, 2020</span></td>
<td><span style="white-space: nowrap">2021-02-25</span></td>
<td class="number"><span class="btn-sm btn-danger">20%</span></td>
<td><a href="/v2/stocks/financial-report/7889/2020-12-31" target="_blank">View</a> </td>
</tr>
<tr class="table-alternate">
<td class="number">1.27</td>
<td class="number">0.000</td>
<td class="number">0.3500</td>
<td class="number">49,244k</td>
<td class="number">6,959k</td>
<td class="text-center">3</td>
<td><span style="white-space: nowrap">2020-09-30</span></td>
<td><span style="white-space: nowrap">31 Dec, 2020</span></td>
<td><span style="white-space: nowrap">2020-11-20</span></td>
<td class="number"><span class="btn-sm btn-success">35%</span></td>
<td><a href="/v2/stocks/financial-report/7889/2020-09-30" target="_blank">View</a> </td>
</tr>
'''
soup = BeautifulSoup(html,'html.parser')
for currentQuarterReportRow in soup.find_all('tr'):
net = currentQuarterReportRow.find_all('td')[-2]
if net.select_one('.btn-danger'):
print(float(net.get_text().replace('%',''))*-1)
else:
print(float(net.get_text().replace('%','')))
Output
-20.0
35.0
Upvotes: 1