Get content of table in BeautifulSoup

Question

I have the following table on a website which I am extracting with BeautifulSoup This is the url (I have also attached a picture

Ideally I would like to have each company in one row in csv however I am getting it in different rows. Please see picture attached.

I would like it to have it like in field "D" but I am getting it in A1,A2,A3...

This is the code I am using to extract:

def _writeInCSV(text):
    print "Writing in CSV File"
    with open('sara.csv', 'wb') as csvfile:
        #spamwriter = csv.writer(csvfile, delimiter='	',quotechar='
', quoting=csv.QUOTE_MINIMAL)
        spamwriter = csv.writer(csvfile, delimiter='	',quotechar="
")

        for item in text:
            spamwriter.writerow([item])

read_list=[]
initial_list=[]


url="http://www.nse.com.ng/Issuers-section/corporate-disclosures/corporate-actions/closure-of-register"
r=requests.get(url)
soup = BeautifulSoup(r._content, "html.parser")

#gdata_even=soup.find_all("td", {"class":"ms-rteTableEvenRow-3"})

gdata_even=soup.find_all("td", {"class":"ms-rteTable-default"})




for item in gdata_even:
    print item.text.encode("utf-8")
    initial_list.append(item.text.encode("utf-8"))
    print ""

_writeInCSV(initial_list)

Can someone help please ?

alecxe · Accepted Answer

Here is the idea:

read the header cells from the table
read all the other rows from the table
zip all the data row cells with headers producing a list of dictionaries
use csv.DictWriter() to dump to csv

Implementation:

import csv
from pprint import pprint

from bs4 import BeautifulSoup
import requests

url = "http://www.nse.com.ng/Issuers-section/corporate-disclosures/corporate-actions/closure-of-register"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

rows = soup.select("table.ms-rteTable-default tr")
headers = [header.get_text(strip=True).encode("utf-8") for header in rows[0].find_all("td")]

data = [dict(zip(headers, [cell.get_text(strip=True).encode("utf-8") for cell in row.find_all("td")]))
        for row in rows[1:]]

# see what the data looks like at this point
pprint(data)

with open('sara.csv', 'wb') as csvfile:
    spamwriter = csv.DictWriter(csvfile, headers, delimiter='	', quotechar="
")

    for row in data:
        spamwriter.writerow(row)

Get content of table in BeautifulSoup

Answers (2)

Related Questions