Jackson Riso
Jackson Riso

Reputation: 85

Open URLs from CSV

I am using Ruby 2.1.0p0 on Mac OS.

I'm parsing a CSV file and grabbing all the URLs, then using Nokogiri and OpenURI to scrape them which is where I'm getting stuck.

When I try to use an each loop to run through the URLs array, I get this error:

initialize': No such file or directory @ rb_sysopen - URL (Errno::ENOENT)

When I manually create an array, and then run through it I get no error. I've tried to_s, URI::encode, and everything I could think of and find on Stack Overflow.

I can copy and paste the URL from the CSV or from the terminal after using puts on the array and it opens in my browser no problem. I try to open it with Nokogiri it's not happening.

Here's my code:

require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'csv'

    events = Array.new
    CSV.foreach('productfeed.csv') do |row|
        events.push URI::encode(row[0]).to_s

    end 


    events.each do |event|

        page = Nokogiri::HTML(open("#{event}")) 

        #eventually, going to find info on the page, and scrape it, but not there yet. 

        #something to show I didn't get an error
        puts "open = success"


    end

Please help! I am completely out of ideas.

Upvotes: 2

Views: 394

Answers (2)

Chris
Chris

Reputation: 1

I tried doing the same thing and found it to work better using a text file.

Here is what I did.

#!/usr/bin/python

#import webbrowser module and time module
import webbrowser
import time

#open text file as "dataFile" and verify there is data in said file
dataFile = open('/home/user/Desktop/urls.txt','r')
if dataFile > 1:
        print("Data file opened successfully")
else:
        print("!!!!NO DATA IN FILE!!!!")
        exit()

#read file line by line, remove any spaces/newlines, and open link in chromium-browser
for lines in dataFile:
        url = str(lines.strip())
        print("Opening " + url)
        webbrowser.get('chromium-browser').open_new_tab(url)

#close file and exit
print("Closing Data File")
dataFile.close()

#wait two seconds before printing "Data file closed".
#this is purely for visual effect.
time.sleep(2)
print("Data file closed")

#after opener has run, user is prompted to press enter key to exit.
raw_input("\n\nURL Opener has run. Press the enter key to exit.")

exit()

Hope this helps!

Upvotes: 0

tadman
tadman

Reputation: 211600

It looks like you're processing the header row, where on of those values is literally "URL". That's not a valid URI so open-uri won't touch it.

There's a headers option for the CSV module that will make use of the headers automatically. Try turning that on and referring to row["URL"]

Upvotes: 3

Related Questions