Jwan622
Jwan622

Reputation: 11639

How to remove headers and second column in CSV in ruby?

I have a CSV that looks like this:

user_id,is_user_unsubscribed
131072,1
7077888,1
11010048,1
12386304,1
327936,1
2228480,1
6553856,1
9830656,1
10158336,1
10486016,1
10617088,1
11010304,1
11272448,1
393728,1
7012864,1
8782336,1
11338240,1
11928064,1
4326144,1
8127232,1
11862784,1

but I want the data to look like this:

131072
7077888
11010048
12386304
327936
...

any ideas on what to do? I have 330,000 rows...

Upvotes: 2

Views: 2633

Answers (5)

Marco
Marco

Reputation: 2092

I have 330,000 rows...

So I guess speed matters, right?

I took your method and the other 2 that was proposed, tested them on a 330,000 rows csv file and made a benchmark to show you something interesting.

require 'csv'
require 'benchmark'

Benchmark.bm(10) do |bm|
    bm.report("Method 1:") {
        data = Array.new
        CSV.foreach("input.csv", headers:true) do |row|
            data << row['user_id']
        end
    }
    bm.report("Method 2:") {
        data = CSV.read("input.csv")[1 .. -1]
        data.delete("is_user_unsubscribed")
    }
    bm.report("Method 3:") {
        data = Array.new
        File.open('input.csv').read.each_line do |line|
            data << line.split(',')[0]
        end
        data.shift # => remove headers
    }
end

The output:

                 user     system      total        real
Method 1:    3.110000   0.010000   3.120000 (  3.129409)
Method 2:    1.990000   0.010000   2.000000 (  2.004016)
Method 3:    0.380000   0.010000   0.390000 (  0.383700)

As you can see handling the CSV file as a simple text file, splitting the lines and pushing them into the array is ~5 times faster than using CSV Module. Of course it has some disadvantages too; i.e., if you'll ever add columns in the input file you'll have to review the code.

It's up to you if you prefer lightspeed code or easier scalability.

Upvotes: 1

Cary Swoveland
Cary Swoveland

Reputation: 110675

I'm guessing that you plan to convert each string that precedes a comma to an integer. If so,

CSV.read("dataset.csv").drop(1).map(:to_i)

is all you need. (For example, "131072,1".to_i #=> 131072.)

If you want strings, you could write

   CSV.read("dataset.csv").drop(1).map { |s| s[/d+/] }

Upvotes: 0

Gabriel Mesquita
Gabriel Mesquita

Reputation: 2411

You can read your file as an array and ignore the first row like this:

data = CSV.read("dataset.csv")[1 .. -1]

This way you can remove the header.

Regarding the column, you can delete a column like this:

data = CSV.read("dataset.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
data.to_csv # => The new CSV in string format

Check this for more info: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html

http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html

Upvotes: 4

Jwan622
Jwan622

Reputation: 11639

I wound up doing this. Is this kosher?

user_ids = []
[]
CSV.foreach("eds_users_sept15.csv", headers:true) do |row|
    user_ids << row['user_id']
 end
  nil
user_ids.count
322101

CSV.open('some_new_file.csv', 'w') do |c|
    user_ids.each do |id|
       c << [id]
    end
end

Upvotes: 1

D. Wood
D. Wood

Reputation: 56

My recommendation would be to read in a line from your file as a string, then split the String that you get by commas (there is a comma separating your columns).

Splitting a Ruby String: https://code-maven.com/ruby-split

require 'pp'
line_num=0
text=File.open('myfile.csv').read
text.each_line do |line|
textArray = line.split
textIWant = textArray[0]
line_num = line_num + 1
   print "#{textIWant}"
end

In this code we open a text file, and read line by line. Each line we split into the text we want by choosing the text from the first column (zeroth item in the array), then print it.

If you do not want the headers, when line_num = 0, add an if statement to not pick up the data. Even better use unless.

Just rewrite a new file with your new data.

Upvotes: 1

Related Questions