Reputation: 94
I am trying to move to the next page till the 'next' button exists at this link 'https://www.cbp.gov/contact/find-broker-by-port/4901?page=1'. I realized that the requests response doesn't have the button in it hence BeautifulSoup cannot find it. I tried adding headers/user-agent to requests but the element still doesn't appear. As far as I can tell, there is no Javascript generating content on this page. Here is the code. What am I missing?
def second_links(second_links_list=[], page2_num=0):
try:
with open('port.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
for row in csv_reader:
row = row[-1]
page2 = requests.get(row.format(page2_num))
soup2 = BeautifulSoup(page2.content, 'html')
results2 = soup2.find(id='region-content')
table2cells = results2.find_all('td', class_='views-field views-field-title views-align-center')
for cell in table2cells:
cell2link = cell.find('a', href=True)
second_links_list.append('https://www.cbp.gov'+cell2link['href'])
next2_page = results2.find('li', class_='pager-next')
if next2_page:
page2_num += 1
second_links(second_links_list, page2_num)
return second_links_list
except requests.exceptions.ConnectionError:
page2.status_code = 'connection refused'
Upvotes: 0
Views: 80
Reputation: 11515
import requests
import pandas as pd
def main(url):
with requests.Session() as req:
allin = []
for item in range(3):
r = req.get(url.format(item))
df = pd.read_html(r.content)[0]
allin.append(df)
new = pd.concat(allin)
print(new)
new.to_csv("data.csv", index=False)
main("https://www.cbp.gov/contact/find-broker-by-port/4901?page={}")
Broker Name Broker Filer Code
0 AXIOM TRADE INC BTL
1 DE LA CRUZ CUSTOMS BROKER INC ENM
2 ECI CUSTOMS BROKERAGE INC BGZ
3 EDWIN SEDA PEREZ 9JD
4 EXPEDITORS INT'L (PUERTO RICO) INC ES9
5 GRISEL PADILLA MU8
6 INTEGRITY CUSTOMS BROKERAGE LLC 9QB
7 INTER-WORLD CUSTOMS BROKERS INC N35
8 JAIME MADURO SANTANA ALA
9 JOSE G FLORES 256
0 JOSE M RAMOS GARCIA 97Q
1 JOSE R BERMUDEZ 9HD
2 JUAN GARCIA 9ST
3 JULIO CACERES DBA TRADEWORKS INC 97D
4 JULIO RODRIGUEZ USCB CORP EWV
5 MANUEL A RAMOS G68
6 MANUEL RAMOS-GANDIA INC CDX
7 NESTOR REYES INC 508
8 NORBERTO DAVID COLON BLC
9 P R INTERNATIONAL CUSTOMS BROKERS D05
0 PANALPINA INC 554
1 PEDRO L CARMONA INC BWV
2 PEDRO L SITIRICHE-TORRES E9T
3 RADIX GROUP INTERNATIONAL INC DBA DHL GLOBAL F... 336
4 RANK SHIPPING OF PUERTO RICO INC D84
5 RENE ORTIZ-VILLAFANE INC 438
6 ROSA MARINA FLORES-ALVAREZ NZ5
7 UPS SUPPLY CHAIN SOLUTIONS INC UPS
Upvotes: 1