Reputation: 85
I am using Ruby 2.1.0p0 on Mac OS.
I'm parsing a CSV file and grabbing all the URLs, then using Nokogiri and OpenURI to scrape them which is where I'm getting stuck.
When I try to use an each
loop to run through the URLs array, I get this error:
initialize': No such file or directory @ rb_sysopen - URL (Errno::ENOENT)
When I manually create an array, and then run through it I get no error. I've tried to_s
, URI::encode
, and everything I could think of and find on Stack Overflow.
I can copy and paste the URL from the CSV or from the terminal after using puts
on the array and it opens in my browser no problem. I try to open it with Nokogiri it's not happening.
Here's my code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'csv'
events = Array.new
CSV.foreach('productfeed.csv') do |row|
events.push URI::encode(row[0]).to_s
end
events.each do |event|
page = Nokogiri::HTML(open("#{event}"))
#eventually, going to find info on the page, and scrape it, but not there yet.
#something to show I didn't get an error
puts "open = success"
end
Please help! I am completely out of ideas.
Upvotes: 2
Views: 394
Reputation: 1
I tried doing the same thing and found it to work better using a text file.
Here is what I did.
#!/usr/bin/python
#import webbrowser module and time module
import webbrowser
import time
#open text file as "dataFile" and verify there is data in said file
dataFile = open('/home/user/Desktop/urls.txt','r')
if dataFile > 1:
print("Data file opened successfully")
else:
print("!!!!NO DATA IN FILE!!!!")
exit()
#read file line by line, remove any spaces/newlines, and open link in chromium-browser
for lines in dataFile:
url = str(lines.strip())
print("Opening " + url)
webbrowser.get('chromium-browser').open_new_tab(url)
#close file and exit
print("Closing Data File")
dataFile.close()
#wait two seconds before printing "Data file closed".
#this is purely for visual effect.
time.sleep(2)
print("Data file closed")
#after opener has run, user is prompted to press enter key to exit.
raw_input("\n\nURL Opener has run. Press the enter key to exit.")
exit()
Hope this helps!
Upvotes: 0
Reputation: 211600
It looks like you're processing the header row, where on of those values is literally "URL"
. That's not a valid URI so open-uri
won't touch it.
There's a headers
option for the CSV module that will make use of the headers automatically. Try turning that on and referring to row["URL"]
Upvotes: 3