Abdul Raoof
Abdul Raoof

Reputation: 133

How to webscrape wiki tables of multiple Companies

I am trying to webscrape wiki tables of multiple companies like samsung,alibaba etc,but can't able to so. Below is My code

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
    html = urlopen("https://en.wikipedia.org/wiki/a")
    bs = BeautifulSoup(html, 'html.parser')
    table = bs.findAll('table')
    for tr in table:
        rows = tr.findAll('tr')
        for row in rows:
            csvRow = [] 
            for cell in row.findAll(['td', 'th']):
                csvRow.append(cell.get_text())

         print(csvRow)
         writer.writerow(csvRow)

Upvotes: 0

Views: 67

Answers (2)

LightBlue
LightBlue

Reputation: 21

html = urlopen("https://en.wikipedia.org/wiki/a") is where the problem is.

you're looping through lst to get the url for each company but failed to do so by using a string literal in the urlopen method.

the way to solve this is to replace html = urlopen("https://en.wikipedia.org/wiki/a") with either one of the following:

  • html = urlopen("https://en.wikipedia.org/wiki/" + a)
  • html = urlopen(f"https://en.wikipedia.org/wiki/{a}") #requires python 3.6+
  • html = urlopen("https://en.wikipedia.org/wiki/{}".format(a))

Upvotes: 0

RishiC
RishiC

Reputation: 826

You are passing a as a string itself, not a reference to one of the items in the list. Here is the corrected code:

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
    html = urlopen("https://en.wikipedia.org/wiki/{}".format(a))
    bs = BeautifulSoup(html, 'html.parser')
    table = bs.findAll('table')
    for tr in table:
        rows = tr.findAll('tr')
        for row in rows:
            csvRow = [] 
            for cell in row.findAll(['td', 'th']):
                csvRow.append(cell.get_text())

         print(csvRow)
         writer.writerow(csvRow)

Upvotes: 1

Related Questions