Kwestion
Kwestion

Reputation: 464

What is the proper way to aggregate data that belongs to a single object in Rails?

This applies to a web scraping project I have and I'm interested in finding out what the best practices are.

Currently I am scraping results from Craigslist for used cars. I get the listing text (listing), the price, the make, the model, and the year of the vehicle.

Currently I have it set up like this:

i = 0
@listings = []
# craigslist_data is scraped via nokogiri
craigslist_data.each do |listing|
  @listings << Array.new
  @listings[i] << listing
  i += 1
end

I then use similar code blocks for price, make, model and year. I end up with something like this:

@listings = [["silver hyundai elantra 2004", "elantra", "hyundai", "$6000", "2004"], ["2008 chevy tahoe", "tahoe", "chevy", "$24000", "2008"]]

In a different post I was told that using this style, i.e. iterating to push data into an array was bad code. Can someone tell me what the correct way to do this would be?

Upvotes: 0

Views: 116

Answers (1)

arbylee
arbylee

Reputation: 2099

One of the problems with pushing data into the array is that if your scraped data is not consistent (an error or something unexpected in the data is present), you could botch a bunch of your collected data. For example, let's say your scraper ended up with this somehow:

craigslist_data_years = [y1...y10] # size == 10
craigslist_data_descriptions = [d2...d10] # size == 9

There are 10 listings, but the first one is missing a description. When you push the data with your existing code, you implicitly assume it was the last one that was missing a description by matching y1 with d2, y2 with d3, etc. Now you've completely mismatched all your data.

If I were to write this, I think I would:

  • Create a Listing class with description, price, make, model, and year attributes
  • Change the scraper to scrape all the attributes of a single listing at a time
  • Add listings to the array as you scrape.

So as you scrape a single listing, grab all the attributes, instantiate a Listing object, and put that entire object into the @listings array.

Additionally, if you wanted to still have arrays that include the price/descriptions/etc of all listings, you can achieve this by doing something like

listing_prices = @listings.map {|listing| listing.price}

Upvotes: 1

Related Questions