Formatting HTML into CSV

Question

I'm scraping a website using Ruby with Nokogiri.

This script creates a local text file, opens a URL, and writes to the file if the expression tr td is met. It is working fine.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

DOC_URL_FILE = "doc.csv" 

url = "http://www.SuperSecretWebSite.com"

data = Nokogiri::HTML(open(url))


all_data = data.xpath('//tr/td').text

File.open(DOC_URL_FILE, 'w'){|file| file.write all_data}

Each line has five fields which I would like to run horizontally then go to the next line after five cells are filled. The data is all there but isn't usable.

I was hoping to learn or get the code from someone that knows how to create a CSV formatting code that:

While the script is reading the code, dump every new td /td x5 into its own cells horizontally.
Go to the next line, etc.

The layout of the HTML is:


    John Smith
    I live here 123
    phone ###
    Birthday
    Other Data

What the final product should look like.

http://picpaste.com/pics/Screenshot-KRnqRGrP.1361813552.png

current output

    john Smith      I live here 123  phone ### Birthday Other Data,

Formatting HTML into CSV

Answers (1)

Related Questions