Cody
Cody

Reputation: 58

Getting table values with Python

I'm trying to get values from an html table using Python. The html looks like this:

<table border=1 width=900>
 <tr><td width=50%>
<table>
    <tr><td align=right><b>Invoice #</td><td><input type=text value="1624140" size=12></td></tr>
    <tr><td align=right>Company</td><td><input type=text value="NZone" size=40></td></tr>
    <tr><td align=right>Name:</td><td><input type=text value="John Dot" size=40></td></tr>
    <tr><td align=right>Address:</td><td><input type=text value="Posie Row, Moscow Road" size=40></td></tr>
    <tr><td align=right>City:</td><td><input type=text value="Co. Dubllin" size=40></td></tr>
    <tr><td align=right>Province</td><td><input type=text value="" size=40></td></tr>
    <tr><td align=right>Postal Code:</td><td><input type=text value="" size=40></td></tr>
    <tr><td align=right>Country:</td><td><input type=text value="IRELAND" size=40></td></tr>
    <tr><td align=right>Date:</td><td><input type=text value="24.4.18" size=12></td></tr>
    <tr><td align=right>Sub Total:</td><td><input type=text value="93,24" size=40></td></tr>
    <tr><td align=right>Combined Weight:</td><td><input type=text value="1,24" size=40></td></tr>
</table>

My code so far is:

from __future__ import print_function
import requests
import re

from bs4 import BeautifulSoup as bs

request = requests.get('url')

content = request.content

soup = bs(content, 'html.parser')  

table = soup.findChildren('table')[1]

rows = table.findChildren('tr')

for row in rows:
cells = row.findChildren('td')
for cell in cells:
    cell_content = cell.getText()

 print(cell_content)

Output is:

Invoice #
Company
Name:
Address:
City:
Province
Postal Code:
Country:
Date:
Sub Total:
Combined Weight:

I would like final output like the following:

Invoice:1624140
Company:NZone
Name:John Dot
Address:Possie Row, Moscow Road
City:Co. Dublin
Province:
Postal Code:
Country:IRELAND
Date:24.4.18
Sub Total:93,24
Combined Weight:1,24

Upvotes: 1

Views: 8733

Answers (4)

Simon Brahan
Simon Brahan

Reputation: 2076

Replace your bottom loop with this:

for row in rows:
    [row_title, row_val] = row.findChildren('td')

    print(row_title.getText(), row_val.input['value'])

This code unpacks the two cells in each row. It then gets the immediate child text of the left td for the row title and drills down into the right td for the value.

Upvotes: 0

Burhan Khalid
Burhan Khalid

Reputation: 174662

How about a dictionary comprehension?

d = {k.findChild('td').getText().strip():k.findChild('input')['value'] for k in rows}

The result is a dictionary like this:

{'Address:': 'Posie Row, Moscow Road',
 'City:': 'Co. Dubllin',
 'Combined Weight:': '1,24',
 'Company': 'NZone',
 'Country:': 'IRELAND',
 'Date:': '24.4.18',
 'Invoice #': '1624140',
 'Name:': 'John Dot',
 'Postal Code:': '',
 'Province': '',
 'Sub Total:': '93,24'}

Upvotes: 1

Shubham Anand
Shubham Anand

Reputation: 128

After Assigning row object, maybe you intended to write this? Because your current code has some indentation error. Please see if it fixes your issue.

rows = table.findChildren('tr')

for row in rows:
    cells = row.findChildren('td')
    for cell in cells:
        cell_content = cell.getText()
        print(cell_content)

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195543

data = """
<table border=1 width=900>
 <tr><td width=50%>
<table>
    <tr><td align=right><b>Invoice #</td><td><input type=text value="1624140" size=12></td></tr>
    <tr><td align=right>Company</td><td><input type=text value="NZone" size=40></td></tr>
    <tr><td align=right>Name:</td><td><input type=text value="John Dot" size=40></td></tr>
    <tr><td align=right>Address:</td><td><input type=text value="Posie Row, Moscow Road" size=40></td></tr>
    <tr><td align=right>City:</td><td><input type=text value="Co. Dubllin" size=40></td></tr>
    <tr><td align=right>Province</td><td><input type=text value="" size=40></td></tr>
    <tr><td align=right>Postal Code:</td><td><input type=text value="" size=40></td></tr>
    <tr><td align=right>Country:</td><td><input type=text value="IRELAND" size=40></td></tr>
    <tr><td align=right>Date:</td><td><input type=text value="24.4.18" size=12></td></tr>
    <tr><td align=right>Sub Total:</td><td><input type=text value="93,24" size=40></td></tr>
    <tr><td align=right>Combined Weight:</td><td><input type=text value="1,24" size=40></td></tr>
</table>
"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

for (td, inp) in zip(soup.find_all('td', align="right"), soup.find_all('input')):
    print(td.text, inp['value'])

Output is:

Invoice # 1624140
Company NZone
Name: John Dot
Address: Posie Row, Moscow Road
City: Co. Dubllin
Province 
Postal Code: 
Country: IRELAND
Date: 24.4.18
Sub Total: 93,24
Combined Weight: 1,24

Upvotes: 2

Related Questions