Reputation: 464
This applies to a web scraping project I have and I'm interested in finding out what the best practices are.
Currently I am scraping results from Craigslist for used cars. I get the listing text (listing), the price, the make, the model, and the year of the vehicle.
Currently I have it set up like this:
i = 0
@listings = []
# craigslist_data is scraped via nokogiri
craigslist_data.each do |listing|
@listings << Array.new
@listings[i] << listing
i += 1
end
I then use similar code blocks for price, make, model and year. I end up with something like this:
@listings = [["silver hyundai elantra 2004", "elantra", "hyundai", "$6000", "2004"], ["2008 chevy tahoe", "tahoe", "chevy", "$24000", "2008"]]
In a different post I was told that using this style, i.e. iterating to push data into an array was bad code. Can someone tell me what the correct way to do this would be?
Upvotes: 0
Views: 116
Reputation: 2099
One of the problems with pushing data into the array is that if your scraped data is not consistent (an error or something unexpected in the data is present), you could botch a bunch of your collected data. For example, let's say your scraper ended up with this somehow:
craigslist_data_years = [y1...y10] # size == 10
craigslist_data_descriptions = [d2...d10] # size == 9
There are 10 listings, but the first one is missing a description. When you push the data with your existing code, you implicitly assume it was the last one that was missing a description by matching y1 with d2, y2 with d3, etc. Now you've completely mismatched all your data.
If I were to write this, I think I would:
So as you scrape a single listing, grab all the attributes, instantiate a Listing object, and put that entire object into the @listings array.
Additionally, if you wanted to still have arrays that include the price/descriptions/etc of all listings, you can achieve this by doing something like
listing_prices = @listings.map {|listing| listing.price}
Upvotes: 1