Kristada673
Kristada673

Reputation: 3744

How do I modify multiple columns in a CSV, and then copy them to a new CSV using Ruby?

Out of the 10 columns there in the original CSV, I have 4 columns which I need to make integers (to process with MATLAB later; the other 6 columns already contain integer values). These 4 columns are: (1) platform (2) push (3) timestamp, and (4) udid.

An example input is: #other_column, Android, Y, 10-05-2015 3:59:59 PM, #other_column, d0155049772de9, #other_columns

The corresponding output should be: #other_column, 2, 1, 1431273612198, #other_column, 17923, #other_columns

So, I wrote the following code:

require 'csv'

CSV.open('C:\Users\hp1\Desktop\Datasets\NewColumns2.csv', "wb") do |csv|
  CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map do |row|

        if row['platform']=='Android'
            row['platform']=2
        elsif row['platform']=='iPhone'
            row['platform']=1
        end


        if row['push']=='Y'
            row['push']=1
        elsif row['push']=='N'
            row['push']=0
        end


        row['timestamp'].to_time.to_i

        row['udid'].to_i

        csv<<row
    end
end

Now, the first 3 columns, weekday, platform and push, are having a small number of unique values for the whole file (i.e., 7, 2 and 2 respectively), which is why I used the above approach. However, the other 2 columns, timestamp and udid, are different - they have several values, a few of them common to some rows in the CSV, but there are thousands of unique values. And hence I thought of converting them to integers in the manner I showed above.

Anyhow, none of the columns are getting converted at all. Plus, there is another problem with the datetime column as it is in a format which Ruby apparently does not recognize as a legitimate time format (a sample looks like this: 10-05-2015 3:59:59 PM). So, what should I do? Thanks.

Upvotes: 1

Views: 87

Answers (3)

shaheenery
shaheenery

Reputation: 8507

Edit - Redo, I missed part of the problem with the udids

Problems

  1. You are using map when you don't need to, CSV#foreach already iterates through all of the rows - remove this
  2. Date - include the ruby standard Time library
  3. Unique ids - it sounds like you want to convert the udid into a shorter unique id since there may be more than one entry per mobile device - use an array to make a collection without repeats and use the index of the device udid in the array as your new shorter unique id

I used this as my input csv:

othercol1,platform,push,timestamp,othercol2,udid,othercol3,othercol4,othercol5,othercol6
11,Android, N, 10-05-2015  3:59:59 PM,22, d0155049772de9,33,44,55,66
11,iPhone, N, 10-05-2015  5:59:59 PM,22, d0155044772de9,33,44,55,66
11,iPhone, Y, 10-06-2015  3:59:59 PM,22, d0155049772de9,33,44,55,66
11,Android, Y, 11-05-2015  3:59:59 PM,22, d0155249772de9,33,44,55,66

Here is my output csv:

11,2,0,1431298799,22,1,33,44,55,66
11,1,0,1431305999,22,2,33,44,55,66
11,1,1,1433977199,22,1,33,44,55,66
11,2,1,1431385199,22,3,33,44,55,66

Here is the script I used:

require 'time' # use ruby standard time library to parse for you
require 'csv'

udids = [] # turn the udid in to a shorter unique id 
CSV.open('new.csv', "wb") do |csv|
  CSV.foreach('old.csv', headers: true) do |row|

        if row['platform']=='Android'
            row['platform']=2
        elsif row['platform']=='iPhone'
            row['platform']=1
        end

        if row['push'].strip =='Y'
            row['push']=1
        elsif row['push'].strip =='N'
            row['push']=0
        end

        row['timestamp'] = Time.parse(row['timestamp']).to_i

        # turn the udid in to a shorter unique id
        unless udids.include?(row['udid'])
          udids << row['udid']
        end

        row['udid'] = udids.index(row['udid']) + 1

        csv << row
    end
end

Upvotes: 1

Shelvacu
Shelvacu

Reputation: 4362

add :converters => :all to your options, so that the dates and numbers are automatically converted. Then, instead of

row['timestamp'].to_time.to_i

which does the conversion but doesn't put it anywhere (it is not in-place), do this:

row['timestamp'] = row['timestamp'].to_time.to_i

note that this only works with converters, otherwise row['timestamp'] is a string and there is no .to_time method.

Upvotes: 0

Lucas
Lucas

Reputation: 378

This is a wrong usage of map, this is not the function you need. Map is if you want to apply a function to all values in the array, and return the array. What you are doing is iterate, doing some changes, then pushing the modified row into a new array - you can just iterate, no need for the map function to be there: CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true) instead of CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map

About the date, you can use strptime to transform string into date: DateTime.strptime("10-05-2015 3:59:59 PM", "%d-%m-%Y %l:%M:%S %p"). Here the docs: http://ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/DateTime.html

Upvotes: 0

Related Questions