alex
alex

Reputation: 2541

Scrape with BeautifulSoup in a line

I'm a beginner in python, so what i want to do is to scrape a website with BeautifulSoup. In a small part of the page source this is the html:

<table class="swift" width="100%">
   <tr>
     <th class="no">ID</th>
     <th>Bank or Institution</th>
     <th>City</th>
     <th class="branch">Branch</th>
     <th>Swift Code</th>
   </tr>   <tr>
     <td align="center">101</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>CONSTANTA</td>
     <td>(CONSTANTA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td>
   </tr>
   <tr>
     <td align="center">102</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>ORADEA</td>
     <td>(ORADEA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td>
   </tr>

I managed to scrape them but this is the output:

ID
Bank or Institution
City
Branch
Swift Code

101
BANK LEUMI ROMANIA S.A.
CONSTANTA
(CONSTANTA BRANCH)
DAFBRO22CTA


102
BANK LEUMI ROMANIA S.A.
ORADEA
(ORADEA BRANCH)
DAFBRO22ORA

When i actually want it like this:

ID, Bank or Institution, City, Branch, Swift Code

101, BANK LEUMI ROMANIA S.A., CONSTANTA, (CONSTANTA BRANCH) ,DAFBRO22CTA

102, BANK LEUMI ROMANIA S.A., ORADEA, (ORADEA BRANCH), DAFBRO22ORA

This is my code:

base_url = "https://www.theswiftcodes.com/"
nr = 0
page = 'page'
country = 'Romania'
while nr < 4:
    url_country = base_url + country + '/' + 'page' + "/" + str(nr) + "/"
    pages = requests.get(url_country)
    soup = BeautifulSoup(pages.text, 'html.parser')

    for script in soup.find_all('script'):
        script.extract()

    tabel = soup.find_all("table")
    text = ("".join([p.get_text() for p in tabel]))
    nr += 1
    print(text)

    file = open('swiftcodes.txt', 'a')
    file.write(text)
    file.close()

    file = open('swiftcodes.txt', 'r')
    for item in file:
        print(item)
    file.close()

Upvotes: 0

Views: 252

Answers (2)

宏杰李
宏杰李

Reputation: 12168

from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.theswiftcodes.com/united-states/')
soup = BeautifulSoup(r.text, 'lxml')
rows = soup.find(class_="swift").find_all('tr')
th = [th.text for th in rows[0].find_all('th')]
print(th)
for row in rows[1:]:
    cell = [i.text for i in row.find_all('td', colspan=False)]
    print(cell)

out:

['ID', 'Bank or Institution', 'City', 'Branch', 'Swift Code']
['1', '1ST CENTURY BANK, N.A.', 'LOS ANGELES,CA', '', 'CETYUS66']
['2', '1ST PMF BANCORP', 'LOS ANGELES,CA', '', 'PMFAUS66']
['3', '1ST PMF BANCORP', 'LOS ANGELES,CA', '', 'PMFAUS66HKG']
['4', '3M COMPANY', 'ST. PAUL,MN', '', 'MMMCUS44']
['5', 'ABACUS FEDERAL SAVINGS BANK', 'NEW YORK,NY', '', 'AFSBUS33']
[]
['6', 'ABBEY NATIONAL TREASURY SERVICES LTD US BRANCH', 'STAMFORD,CT', '', 'ANTSUS33']
['7', 'ABBOTT LABORATORIES', 'ABBOTT PARK,IL', '', 'ABTTUS44']
['8', 'ABBVIE, INC.', 'CHICAGO,IL', '', 'ABBVUS44']
['9', 'ABEL/NOSER CORP', 'NEW YORK,NY', '', 'ABENUS3N']

Upvotes: 0

G&#225;bor Erdős
G&#225;bor Erdős

Reputation: 3699

This should do the trick

from bs4 import BeautifulSoup

str = """<table class="swift" width="100%">
   <tr>
     <th class="no">ID</th>
     <th>Bank or Institution</th>
     <th>City</th>
     <th class="branch">Branch</th>
     <th>Swift Code</th>
   </tr>   <tr>
     <td align="center">101</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>CONSTANTA</td>
     <td>(CONSTANTA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td>
   </tr>
   <tr>
     <td align="center">102</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>ORADEA</td>
     <td>(ORADEA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td>
   </tr>"""

soup = BeautifulSoup(str)

for i in soup.find_all("tr"):
    result = ""
    for j in i.find_all("th"): # find all the header tags
        result += j.text + ", "
    for j in i.find_all("td"): # find the cell tags
        result += j.text + ", "
    print(result.rstrip(', ')) 

Output:

ID, Bank or Institution, City, Branch, Swift Code
101, BANK LEUMI ROMANIA S.A., CONSTANTA, (CONSTANTA BRANCH), DAFBRO22CTA
102, BANK LEUMI ROMANIA S.A., ORADEA, (ORADEA BRANCH), DAFBRO22ORA

Upvotes: 2

Related Questions