Anant Gupta
Anant Gupta

Reputation: 53

Parsing HTML Tables in Python with BeautifulSoup

I am trying to access data from second table onwards that is from the specifications tab, but my code is only returning data from the first table. From reading many of the other posts on SO, I have come up with the following which does not come close to creating the lists I want -

from bs4 import BeautifulSoup
import csv
html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
table = soup.find("table")


output_rows = []
for table_row in table.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)

output_rows

Upvotes: 2

Views: 99

Answers (2)

KunduK
KunduK

Reputation: 33384

You need to select the class name of the table when you are targeting a specific table.Try following css selector

Code:

from bs4 import BeautifulSoup

html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
tables= soup.select("table.specs:not(.features)")


output_rows = []
for table in tables:
    for table_row in table.findAll('tr'):
        columns = table_row.findAll('td')
        output_row = []
        for column in columns:
           output_row.append(column.text.strip())
        output_rows.append(output_row)

print(output_rows)

Output:

[['Engine', '1197cc, 4 Cylinders Inline, 4 Valves/Cylinder, DOHC'], ['Engine Type', 'VVT'], ['Fuel Type', 'Petrol'], ['Max Power (bhp@rpm)', '82 bhp @ 6000 rpm'], ['Max Torque (Nm@rpm)', '115 Nm @ 4000 rpm'], ['Mileage (ARAI)', '21.01 kmpl'], ['Drivetrain', 'FWD'], ['Transmission', 'Manual - 5 Gears'], ['Emission Standard', 'BS 6'], ['Length', '3995 mm'], ['Width', '1745 mm'], ['Height', '1510 mm'], ['Wheelbase', '2520 mm'], ['Ground Clearance', '170 mm'], ['Kerb Weight', '865 kg'], ['Doors', '5 Doors'], ['Seating Capacity', '5 Person'], ['No of Seating Rows', '2 Rows'], ['Bootspace', '339 litres'], ['Fuel Tank Capacity', '37 litres'], ['Suspension Front', 'McPherson Strut'], ['Suspension Rear', 'Torsion Beam'], ['Front Brake Type', 'Disc'], ['Rear Brake Type', 'Drum'], ['Minimum Turning Radius', '4.9 metres'], ['Steering Type', 'Power assisted (Electric)'], ['Wheels', 'Steel Rims'], ['Spare Wheel', 'Steel'], ['Front Tyres', '185 / 65 R15'], ['Rear Tyres', '185 / 65 R15']]

Upvotes: 1

chitown88
chitown88

Reputation: 28565

.find() will only return the first element/tag it finds. You want to use .find_all() which would then return a list of all the elements/tags specified.

HOWEVER, can I suggest Pandas in this situation. Pandas' .read_html() uses beautifulsoup under the hood, and looks for those <table> tags. It then returns them as a list of dataframes. It's just a matter of selecting the index position of the table you want. Looking at he site, looks to be tables returned in index positions 1-4:

import pandas as pd

dfs = pd.read_html('http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/')

result = pd.DataFrame()
for df in dfs[1:5]:
    result = result.append(df, sort=False).reset_index(drop=True)

Output:

print (result)
                         0                                                  1
0                   Engine  1197cc, 4 Cylinders Inline, 4 Valves/Cylinder,...
1              Engine Type                                                VVT
2                Fuel Type                                             Petrol
3      Max Power (bhp@rpm)                                  82 bhp @ 6000 rpm
4      Max Torque (Nm@rpm)                                  115 Nm @ 4000 rpm
5           Mileage (ARAI)                                         21.01 kmpl
6               Drivetrain                                                FWD
7             Transmission                                   Manual - 5 Gears
8        Emission Standard                                               BS 6
9                   Length                                            3995 mm
10                   Width                                            1745 mm
11                  Height                                            1510 mm
12               Wheelbase                                            2520 mm
13        Ground Clearance                                             170 mm
14             Kerb Weight                                             865 kg
15                   Doors                                            5 Doors
16        Seating Capacity                                           5 Person
17      No of Seating Rows                                             2 Rows
18               Bootspace                                         339 litres
19      Fuel Tank Capacity                                          37 litres
20        Suspension Front                                    McPherson Strut
21         Suspension Rear                                       Torsion Beam
22        Front Brake Type                                               Disc
23         Rear Brake Type                                               Drum
24  Minimum Turning Radius                                         4.9 metres
25           Steering Type                          Power assisted (Electric)
26                  Wheels                                         Steel Rims
27             Spare Wheel                                              Steel
28             Front Tyres                                       185 / 65 R15
29              Rear Tyres                                       185 / 65 R15

Upvotes: 1

Related Questions