Writing CSV from scraped HTML data

Question

I was able to extract data using below code from the Russian Statistics website and to create a CSV file. However, I have two issues, firstly, I don't know why there is always a blank row inserted between two-non-blank rows. Secondly, I am unaware how to write a nice table where data from the same month is spread across different columns. Right now, everything is just in one cell. Thanks.

from bs4 import BeautifulSoup
import lxml
import urllib2
import csv

f=csv.writer(open("Russia.csv","w"))
mainurl='http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/I000750R.HTM'
urlroot='http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/'

data = urllib2.urlopen(mainurl).read()
page = BeautifulSoup(data,'html.parser')

for link in page.findAll('a'):
    page = urllib2.urlopen(urlroot+link.get('href'))
    soup = BeautifulSoup(page, 'lxml')
    years=soup.findAll('title',text=True)

    table = soup.find('center').find('table')
    for row in table.find_all('tr')[3:]:
        cells = [cell.get_text(strip=True) for cell in row.find_all('td')]
        f.writerow([cells])

alecxe · Accepted Answer

You are unintentionally making a list of lists here:

cells = [cell.get_text(strip=True) for cell in row.find_all('td')]
f.writerow([cells])

Instead, write the cells list directly:

f.writerow(cells)

Writing CSV from scraped HTML data

Answers (1)

Related Questions