Reputation: 3971
I want to extract a table from the URL, but got lost... see what I have done below:
url = "https://www.marinetraffic.com/en/ais/index/ports/all/per_page:50"
headers = {'User-agent': 'Mozilla/5.0'}
raw_html = requests.get(url, headers=headers)
raw_data = raw_html.text
soup_data = BeautifulSoup(raw_data, "lxml")
td = soup_data.findAll('tr')[1:]
country = []
for data in td:
col = data.find_all('td')
country.append(col)
How do I get the text and URL of some of the columns (Country, Port Name, UN/LOCODE, Type, and Port's Map)?
Upvotes: 0
Views: 1202
Reputation: 30605
I did some scraping for you. You can use a dictionary with key value as table headers like below. You can iterate through individual td to get the required column and then use the find('tag_name')['attribute_name']
to get url, src, href etc and .text
for texts. Hope this helps.
url = "https://www.marinetraffic.com/en/ais/index/ports/all/per_page:50"
headers = {'User-agent': 'Mozilla/5.0'}
raw_html = requests.get(url, headers=headers)
raw_data = raw_html.text
soup_data = BeautifulSoup(raw_data, "lxml")
td = soup_data.findAll('tr')[1:]
country = []
for data in td:
col = data.find_all('td')
details = {}
for i,col in enumerate(col):
if i == 0:
details['Img-src'] = ("https://www.marinetraffic.com"+col.find('img')['src'])
if i == 1:
details["Port_name"] = (col.text.replace('\n',''))
if i == 2:
details['UN/LOCODE'] = (col.text.replace('\r\n','').replace(" ",""))
if i == 4:
details['type'] = (col.text.replace('\r\n','').replace(" ",""))
if i == 5:
details['map_url'] = ("https://www.marinetraffic.com"+(col.find('a')['href']))
country.append(details)
Output:
[{'Img-src': 'https://www.marinetraffic.com/img/flags/png40/CN.png', 'Port_name': 'SHANGHAI', 'UN/LOCODE': 'CNSHA', 'map_url': 'https://www.marinetraffic.com/en/ais/home/zoom:9/centerx:121.614746/centery:31.3663635/showports:true/portid:1253', 'type': 'Port'}, {'Img-src': 'https://www.marinetraffic.com/img/flags/png40/CN.png', 'Port_name': 'MAANSHAN', 'UN/LOCODE': 'CNMAA', 'map_url': 'https://www.marinetraffic.com/en/ais/home/zoom:14/centerx:118.459503/centery:31.7180004/showports:true/portid:2746', 'type': 'Port'}, {'Img-src': 'https://www.marinetraffic.com/img/flags/png40/HK.png', 'Port_name': 'HONG KONG', 'UN/LOCODE': 'HKHKG', 'map_url': 'https://www.marinetraffic.com/en/ais/home/zoom:14/centerx:114.181366/centery:22.2879486/showports:true/portid:2429', 'type': 'Port'}, ... ]
Upvotes: 1