mr_bungles
mr_bungles

Reputation: 77

Writing CSV from scraped HTML data

I was able to extract data using below code from the Russian Statistics website and to create a CSV file. However, I have two issues, firstly, I don't know why there is always a blank row inserted between two-non-blank rows. Secondly, I am unaware how to write a nice table where data from the same month is spread across different columns. Right now, everything is just in one cell. Thanks.

from bs4 import BeautifulSoup
import lxml
import urllib2
import csv

f=csv.writer(open("Russia.csv","w"))
mainurl='http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/I000750R.HTM'
urlroot='http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/'

data = urllib2.urlopen(mainurl).read()
page = BeautifulSoup(data,'html.parser')

for link in page.findAll('a'):
    page = urllib2.urlopen(urlroot+link.get('href'))
    soup = BeautifulSoup(page, 'lxml')
    years=soup.findAll('title',text=True)

    table = soup.find('center').find('table')
    for row in table.find_all('tr')[3:]:
        cells = [cell.get_text(strip=True) for cell in row.find_all('td')]
        f.writerow([cells])

Upvotes: 1

Views: 203

Answers (1)

alecxe
alecxe

Reputation: 474241

You are unintentionally making a list of lists here:

cells = [cell.get_text(strip=True) for cell in row.find_all('td')]
f.writerow([cells])

Instead, write the cells list directly:

f.writerow(cells)

Upvotes: 1

Related Questions