Reputation: 11639
I have a CSV that looks like this:
user_id,is_user_unsubscribed
131072,1
7077888,1
11010048,1
12386304,1
327936,1
2228480,1
6553856,1
9830656,1
10158336,1
10486016,1
10617088,1
11010304,1
11272448,1
393728,1
7012864,1
8782336,1
11338240,1
11928064,1
4326144,1
8127232,1
11862784,1
but I want the data to look like this:
131072
7077888
11010048
12386304
327936
...
any ideas on what to do? I have 330,000 rows...
Upvotes: 2
Views: 2633
Reputation: 2092
I have 330,000 rows...
So I guess speed matters, right?
I took your method and the other 2 that was proposed, tested them on a 330,000 rows csv file and made a benchmark to show you something interesting.
require 'csv'
require 'benchmark'
Benchmark.bm(10) do |bm|
bm.report("Method 1:") {
data = Array.new
CSV.foreach("input.csv", headers:true) do |row|
data << row['user_id']
end
}
bm.report("Method 2:") {
data = CSV.read("input.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
}
bm.report("Method 3:") {
data = Array.new
File.open('input.csv').read.each_line do |line|
data << line.split(',')[0]
end
data.shift # => remove headers
}
end
The output:
user system total real
Method 1: 3.110000 0.010000 3.120000 ( 3.129409)
Method 2: 1.990000 0.010000 2.000000 ( 2.004016)
Method 3: 0.380000 0.010000 0.390000 ( 0.383700)
As you can see handling the CSV file as a simple text file, splitting the lines and pushing them into the array is ~5 times faster than using CSV Module. Of course it has some disadvantages too; i.e., if you'll ever add columns in the input file you'll have to review the code.
It's up to you if you prefer lightspeed code or easier scalability.
Upvotes: 1
Reputation: 110675
I'm guessing that you plan to convert each string that precedes a comma to an integer. If so,
CSV.read("dataset.csv").drop(1).map(:to_i)
is all you need. (For example, "131072,1".to_i #=> 131072
.)
If you want strings, you could write
CSV.read("dataset.csv").drop(1).map { |s| s[/d+/] }
Upvotes: 0
Reputation: 2411
You can read your file as an array and ignore the first row like this:
data = CSV.read("dataset.csv")[1 .. -1]
This way you can remove the header.
Regarding the column, you can delete a column like this:
data = CSV.read("dataset.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
data.to_csv # => The new CSV in string format
Check this for more info: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html
http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html
Upvotes: 4
Reputation: 11639
I wound up doing this. Is this kosher?
user_ids = []
[]
CSV.foreach("eds_users_sept15.csv", headers:true) do |row|
user_ids << row['user_id']
end
nil
user_ids.count
322101
CSV.open('some_new_file.csv', 'w') do |c|
user_ids.each do |id|
c << [id]
end
end
Upvotes: 1
Reputation: 56
My recommendation would be to read in a line from your file as a string, then split the String that you get by commas (there is a comma separating your columns).
Splitting a Ruby String: https://code-maven.com/ruby-split
require 'pp'
line_num=0
text=File.open('myfile.csv').read
text.each_line do |line|
textArray = line.split
textIWant = textArray[0]
line_num = line_num + 1
print "#{textIWant}"
end
In this code we open a text file, and read line by line. Each line we split into the text we want by choosing the text from the first column (zeroth item in the array), then print it.
If you do not want the headers, when line_num = 0, add an if statement to not pick up the data. Even better use unless
.
Just rewrite a new file with your new data.
Upvotes: 1