Reputation: 13
I am currently working on a small application that will allow someone to enter strings that'll separate a larger file into smaller files.
I am having trouble getting the file to be split into the newer files. This is my code for my CSV method:
require 'csv'
new_list = []
old_list = []
def import_csv(file)
puts "Method begins"
CSV.foreach("public_schools_list.csv", :encoding => 'windows-1251:utf-8', :headers => true) do |row|
if row["company"].downcase.include?("academy" || "lutheran" || "jewish" || "private" || "christian")
CSV.open("new_list.csv", "ab") do |n|
n << row
puts "First Row"
new_list << n
end
else
CSV.open("old_list.csv", "ab") do |o|
o << row
puts "Second Row"
old_list << o
end
end
end
end
puts "New Csv: #{new_list.count}"
puts "Old Csv: #{old_list.count}"
I am just trying to check this code to see if it is splitting the files. I'm not sure if some of this is correct. There are currently only four items in the CSV list. I use the count
method to check if they go into the correct files.
What code am I missing to get the main file to be separated into two?
here is my controller:
include CSVUpload
def import
csv_separate(params[:file].tempfile)
redirect_to root_url
end
then here is the module i am using:
require 'csv'
module CSVUpload
def csv_separate(file)
new_list_counter = 0
old_list_counter = 0
# puts "Method begins"
CSV.open("new_list.csv", "ab") do |new_list|
CSV.open("old_list.csv", "ab") do |old_list|
CSV.foreach(file, :encoding => 'windows-1251:utf-8', :headers => true) do |row|
if row["company"][/\b(?:academy|lutheran|jewish|private|christian)\b/i]
new_list << row
new_list_counter += 1
else
old_list << row
old_list_counter += 1
end
end
end
end
end
end
and then here is the form:
<div>
<h3>Import a CSV File</h3>
<%= form_tag({action: "import"}, multipart: true) do %>
<%= file_field_tag("file") %>
<%= submit_tag "Import CSV" %>
<% end %>
</div>
i hope that helps out a bit. Thanks!
Upvotes: 0
Views: 151
Reputation: 160551
Don't open then close the output files for every line you read. That's horribly inefficient and wastes CPU time. Open them outside the CSV.foreach
loop, then write to them conditionally.
Also, don't aggregate the lines of the file in memory just so you can count them. Instead, increment a counter.
Also, include?
doesn't work this way:
include?("academy" || "lutheran" || "jewish" || "private" || "christian")
The documentation says:
str.include? other_str -> true or false ------------------------------------------------------------------------------ Returns true if str contains the given string orcharacter. "hello".include? "lo" #=> true "hello".include? "ol" #=> false "hello".include? ?h #=> true
Notice that it takes a single character or string. Using an or'd list of strings results in only the first string:
"academy" || "lutheran" || "jewish" || "private" || "christian" # => "academy"
As a result, only "academy" would ever be tested for inclusion.
This is untested but looks right enough:
require 'csv'
def import_csv(file)
new_list_counter = 0
old_list_counter = 0
puts "Method begins"
CSV.open("new_list.csv", "ab") do |new_list|
CSV.open("old_list.csv", "ab") do |old_list|
CSV.foreach("public_schools_list.csv", :encoding => 'windows-1251:utf-8', :headers => true) do |row|
if row["company"][/\b(?:academy|lutheran|jewish|private|christian)\b/i]
new_list << row
puts "First Row"
new_list_counter += 1
else
old_list << row
puts "Second Row"
old_list_counter += 1
end
end
end
end
puts "New CSV: #{ new_list_counter }"
puts "Old CSV: #{ old_list_counter }"
end
Notice /\b(?:academy|lutheran|jewish|private|christian)\b/i
. This tests for multiple sub-strings that are complete words. \b
in regex-ese means "word-boundary", so each of the embedded words or'd together using |
are tested separately. /i
means it's case-insensitive. To accomplish the same thing using include?
would involve looping over a list of words, which is one of those algorithms that grows slower and slower as you add more words to test.
row["company"][/\b(?:academy|lutheran|jewish|private|christian)\b/i]
is extremely efficient in comparison to that possible loop-algorithm. Using regex patterns aren't a universal panacea. Used incorrectly regex patterns can actually slow your program, sometimes drastically. Used correctly though, they can reduce your code and, in the right situation, when written correctly, can provide dramatic speed improvements. So, check using benchmarks before you blindly introduce them into code.
Upvotes: 2