Reputation: 53
I am trying to access data from second table onwards that is from the specifications tab, but my code is only returning data from the first table. From reading many of the other posts on SO, I have come up with the following which does not come close to creating the lists I want -
from bs4 import BeautifulSoup
import csv
html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
table = soup.find("table")
output_rows = []
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.text)
output_rows.append(output_row)
output_rows
Upvotes: 2
Views: 99
Reputation: 33384
You need to select the class name of the table when you are targeting a specific table.Try following css
selector
Code:
from bs4 import BeautifulSoup
html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
tables= soup.select("table.specs:not(.features)")
output_rows = []
for table in tables:
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.text.strip())
output_rows.append(output_row)
print(output_rows)
Output:
[['Engine', '1197cc, 4 Cylinders Inline, 4 Valves/Cylinder, DOHC'], ['Engine Type', 'VVT'], ['Fuel Type', 'Petrol'], ['Max Power (bhp@rpm)', '82 bhp @ 6000 rpm'], ['Max Torque (Nm@rpm)', '115 Nm @ 4000 rpm'], ['Mileage (ARAI)', '21.01 kmpl'], ['Drivetrain', 'FWD'], ['Transmission', 'Manual - 5 Gears'], ['Emission Standard', 'BS 6'], ['Length', '3995 mm'], ['Width', '1745 mm'], ['Height', '1510 mm'], ['Wheelbase', '2520 mm'], ['Ground Clearance', '170 mm'], ['Kerb Weight', '865 kg'], ['Doors', '5 Doors'], ['Seating Capacity', '5 Person'], ['No of Seating Rows', '2 Rows'], ['Bootspace', '339 litres'], ['Fuel Tank Capacity', '37 litres'], ['Suspension Front', 'McPherson Strut'], ['Suspension Rear', 'Torsion Beam'], ['Front Brake Type', 'Disc'], ['Rear Brake Type', 'Drum'], ['Minimum Turning Radius', '4.9 metres'], ['Steering Type', 'Power assisted (Electric)'], ['Wheels', 'Steel Rims'], ['Spare Wheel', 'Steel'], ['Front Tyres', '185 / 65 R15'], ['Rear Tyres', '185 / 65 R15']]
Upvotes: 1
Reputation: 28565
.find()
will only return the first element/tag it finds. You want to use .find_all()
which would then return a list of all the elements/tags specified.
HOWEVER, can I suggest Pandas in this situation. Pandas' .read_html()
uses beautifulsoup under the hood, and looks for those <table>
tags. It then returns them as a list of dataframes. It's just a matter of selecting the index position of the table you want. Looking at he site, looks to be tables returned in index positions 1-4:
import pandas as pd
dfs = pd.read_html('http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/')
result = pd.DataFrame()
for df in dfs[1:5]:
result = result.append(df, sort=False).reset_index(drop=True)
Output:
print (result)
0 1
0 Engine 1197cc, 4 Cylinders Inline, 4 Valves/Cylinder,...
1 Engine Type VVT
2 Fuel Type Petrol
3 Max Power (bhp@rpm) 82 bhp @ 6000 rpm
4 Max Torque (Nm@rpm) 115 Nm @ 4000 rpm
5 Mileage (ARAI) 21.01 kmpl
6 Drivetrain FWD
7 Transmission Manual - 5 Gears
8 Emission Standard BS 6
9 Length 3995 mm
10 Width 1745 mm
11 Height 1510 mm
12 Wheelbase 2520 mm
13 Ground Clearance 170 mm
14 Kerb Weight 865 kg
15 Doors 5 Doors
16 Seating Capacity 5 Person
17 No of Seating Rows 2 Rows
18 Bootspace 339 litres
19 Fuel Tank Capacity 37 litres
20 Suspension Front McPherson Strut
21 Suspension Rear Torsion Beam
22 Front Brake Type Disc
23 Rear Brake Type Drum
24 Minimum Turning Radius 4.9 metres
25 Steering Type Power assisted (Electric)
26 Wheels Steel Rims
27 Spare Wheel Steel
28 Front Tyres 185 / 65 R15
29 Rear Tyres 185 / 65 R15
Upvotes: 1