Reputation: 3744
Out of the 10 columns there in the original CSV, I have 4 columns which I need to make integers (to process with MATLAB later; the other 6 columns already contain integer values). These 4 columns are: (1) platform (2) push (3) timestamp, and (4) udid.
An example input is: #other_column, Android, Y, 10-05-2015 3:59:59 PM, #other_column, d0155049772de9, #other_columns
The corresponding output should be: #other_column, 2, 1, 1431273612198, #other_column, 17923, #other_columns
So, I wrote the following code:
require 'csv'
CSV.open('C:\Users\hp1\Desktop\Datasets\NewColumns2.csv', "wb") do |csv|
CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map do |row|
if row['platform']=='Android'
row['platform']=2
elsif row['platform']=='iPhone'
row['platform']=1
end
if row['push']=='Y'
row['push']=1
elsif row['push']=='N'
row['push']=0
end
row['timestamp'].to_time.to_i
row['udid'].to_i
csv<<row
end
end
Now, the first 3 columns, weekday
, platform
and push
, are having a small number of unique values for the whole file (i.e., 7, 2 and 2 respectively), which is why I used the above approach. However, the other 2 columns, timestamp
and udid
, are different - they have several values, a few of them common to some rows in the CSV, but there are thousands of unique values. And hence I thought of converting them to integers in the manner I showed above.
Anyhow, none of the columns are getting converted at all. Plus, there is another problem with the datetime
column as it is in a format which Ruby apparently does not recognize as a legitimate time format (a sample looks like this: 10-05-2015 3:59:59 PM
). So, what should I do? Thanks.
Upvotes: 1
Views: 87
Reputation: 8507
Edit - Redo, I missed part of the problem with the udids
Problems
I used this as my input csv:
othercol1,platform,push,timestamp,othercol2,udid,othercol3,othercol4,othercol5,othercol6
11,Android, N, 10-05-2015 3:59:59 PM,22, d0155049772de9,33,44,55,66
11,iPhone, N, 10-05-2015 5:59:59 PM,22, d0155044772de9,33,44,55,66
11,iPhone, Y, 10-06-2015 3:59:59 PM,22, d0155049772de9,33,44,55,66
11,Android, Y, 11-05-2015 3:59:59 PM,22, d0155249772de9,33,44,55,66
Here is my output csv:
11,2,0,1431298799,22,1,33,44,55,66
11,1,0,1431305999,22,2,33,44,55,66
11,1,1,1433977199,22,1,33,44,55,66
11,2,1,1431385199,22,3,33,44,55,66
Here is the script I used:
require 'time' # use ruby standard time library to parse for you
require 'csv'
udids = [] # turn the udid in to a shorter unique id
CSV.open('new.csv', "wb") do |csv|
CSV.foreach('old.csv', headers: true) do |row|
if row['platform']=='Android'
row['platform']=2
elsif row['platform']=='iPhone'
row['platform']=1
end
if row['push'].strip =='Y'
row['push']=1
elsif row['push'].strip =='N'
row['push']=0
end
row['timestamp'] = Time.parse(row['timestamp']).to_i
# turn the udid in to a shorter unique id
unless udids.include?(row['udid'])
udids << row['udid']
end
row['udid'] = udids.index(row['udid']) + 1
csv << row
end
end
Upvotes: 1
Reputation: 4362
add :converters => :all
to your options, so that the dates and numbers are automatically converted. Then, instead of
row['timestamp'].to_time.to_i
which does the conversion but doesn't put it anywhere (it is not in-place), do this:
row['timestamp'] = row['timestamp'].to_time.to_i
note that this only works with converters, otherwise row['timestamp']
is a string and there is no .to_time
method.
Upvotes: 0
Reputation: 378
This is a wrong usage of map, this is not the function you need. Map is if you want to apply a function to all values in the array, and return the array. What you are doing is iterate, doing some changes, then pushing the modified row into a new array - you can just iterate, no need for the map function to be there:
CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true)
instead of CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map
About the date, you can use strptime to transform string into date: DateTime.strptime("10-05-2015 3:59:59 PM", "%d-%m-%Y %l:%M:%S %p")
. Here the docs: http://ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/DateTime.html
Upvotes: 0