Reputation: 133
I am trying to webscrape wiki tables of multiple companies like samsung,alibaba etc,but can't able to so. Below is My code
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
html = urlopen("https://en.wikipedia.org/wiki/a")
bs = BeautifulSoup(html, 'html.parser')
table = bs.findAll('table')
for tr in table:
rows = tr.findAll('tr')
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text())
print(csvRow)
writer.writerow(csvRow)
Upvotes: 0
Views: 67
Reputation: 21
html = urlopen("https://en.wikipedia.org/wiki/a")
is where the problem is.
you're looping through lst
to get the url for each company but failed to do so by using a string literal in the urlopen
method.
the way to solve this is to replace html = urlopen("https://en.wikipedia.org/wiki/a")
with either one of the following:
html = urlopen("https://en.wikipedia.org/wiki/" + a)
html = urlopen(f"https://en.wikipedia.org/wiki/{a}") #requires python 3.6+
html = urlopen("https://en.wikipedia.org/wiki/{}".format(a))
Upvotes: 0
Reputation: 826
You are passing a
as a string itself, not a reference to one of the items in the list. Here is the corrected code:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
html = urlopen("https://en.wikipedia.org/wiki/{}".format(a))
bs = BeautifulSoup(html, 'html.parser')
table = bs.findAll('table')
for tr in table:
rows = tr.findAll('tr')
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text())
print(csvRow)
writer.writerow(csvRow)
Upvotes: 1