Reputation: 39
I'm trying to pull data and information from different sites. I worked on a script it works well, however, when I try to print the result there is some problems like the results are like sentences there is no comma no delimiter. Nothing I tried myself and nothing.
This is the site i am working on http://www.conditions-de-banque-tunisie.com/banques-en-tunisie.html
I have tried to put a comma between results and nothing. The comma comes at the end, that's all.
linksname.find_all('p')[i].text + ','
import requests
import bs4
import csv
import io
response = requests.get('http://www.conditions-de-banque-tunisie.com/banques-en-tunisie.html')
response.status_code
soup_obj = bs4.BeautifulSoup(response.text, "html.parser")
soup_obj.prettify()
#print('shhh')
linksname = soup_obj.find(class_='bloc-banques-liste')
#linksname.text
textContent = []
for i in range(0,1): links = linksname.find_all('p')[i].text
textContent.append(links)
for text in textContent:
print('----------------------------')
print(text)
with io.open("fname.txt", "w", encoding="utf-8") as f: f.write(text)
This is the result :
North Africa International BankAdresse : Avenue Kheireddine Pacha Ennassim Montplaisir 1002 TUNISTé : +216 71 950 800Fax : +216 71 950 840Site web : http://www.naibbank.com/
Qatar National BankAdresse : Rue de la cité des sciences - B.P. 320 - 1080 Tunis CedexTé : +216 71 750 000Fax : +216 71 235 611Site web : http://www.qnb.com.tn/
I expect that the result will be like :
North Africa International Bank , Adresse : Avenue Kheireddine Pacha Ennassim Montplaisir 1002 , TUNISTé : +216 71 950 800 , Fax : +216 71 950 840 ,Site web : http://www.naibbank.com/
Or just the best results is:
North Africa International Bank , venue Kheireddine Pacha Ennassim Montplaisir 1002 , +216 71 950 800 , +216 71 950 840 , : http://www.naibbank.com/
Upvotes: 0
Views: 487
Reputation: 438
Instead of using "bloc-banques-liste" class to find objects you can use "banques-liste-desc"
This will directly give you the list of all blocks.
Check following code.
import requests
import bs4
import csv
import io
response = requests.get('http://www.conditions-de-banque-tunisie.com/banques-
en-tunisie.html')
response.status_code
soup_obj = bs4.BeautifulSoup(response.text, "html.parser")
soup_obj.prettify()
linksname = soup_obj.find_all(class_='banques-liste-desc')
for i in range(0, len(linksname)):
name = linksname[i].find('h1').find('a').text
print(name)
address = linksname[i].find_all('p')
for j in range(0, len(address)):
print(address[j].text)
print("----------------------------")
Here I have printed all the values separately, instead of that you can directly join them using comma.
Upvotes: 0
Reputation: 24930
The following is a somewhat simplified version of your code, but it should get you to where you need to be, after fitting it to your own style:
from bs4 import BeautifulSoup as bs
import requests
response = requests.get('http://www.conditions-de-banque-tunisie.com/banques-en-tunisie.html')
soup = bs(response.text, "html.parser")
textContent = []
linksname = soup.find(class_='bloc-banques-liste')
for name in linksname:
entry = linksname.find_all('p')[0].text
textContent.append(entry)
break
for bank in textContent:
print(bank.replace(' :',',').strip())
Output:
Al BarakaAdresse, 88, Avenue Hedi Chaker 1002 TunisTé, +216 71 790 000Fax, +21671 780 235Email, [email protected] web, http://www.albarakabank.com.tn/
Amen BankAdresse, Avenue Mohamed V 1002 Tunis - TunisieTé, (+216) 71 148 000Fax, (+216) 71 833 517Site web, http://www.amenbank.com.tn/
etc.
Upvotes: 0
Reputation: 1265
Check the code below and let me know if that helps.
import requests
import bs4
import csv
import io
response = requests.get('http://www.conditions-de-banque-tunisie.com/banques-en-tunisie.html')
response.status_code
soup_obj = bs4.BeautifulSoup(response.text, "html.parser")
soup_obj.prettify()
#print('shhh')
linksname = soup_obj.find(class_='bloc-banques-liste')
textContent = []
links = linksname.findChildren("div", class_='banques-liste-desc', recursive=True)
links = [" \n ".join([y.text for y in link.findChildren("p")]) for link in links]
print(str(links))
Upvotes: 1