Reputation: 6381
I am running a command line program, called Primer 3. It takes an input file and returns data to standard output. I am trying to write a Ruby script which will accept that input, and put the entries into a hash.
The results returned are below. I would like to split the data on the '=' sign, so that the has would something like this:
{:SEQUENCE_ID => "example", :SEQUENCE_TEMPLATE => "GTAGTCAGTAGACNAT..etc", :SEQUENCE_TARGET => "37,21" etc }
I would also like to lower case the keys, ie:
{:sequence_id => "example", :sequence_template => "GTAGTCAGTAGACNAT..etc", :sequence_target => "37,21" etc }
This is my current script:
#!/usr/bin/ruby
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
name, height = line.split(/\=/)
primer3[name] = height.to_i
end
puts primer3
It is returning this:
Primer 3 hash
{"SEQUENCE_ID"=>0, "SEQUENCE_TEMPLATE"=>0, "SEQUENCE_TARGET"=>37, "PRIMER_TASK"=>0, "PRIMER_PICK_LEFT_PRIMER"=>1, "PRIMER_PICK_INTERNAL_OLIGO"=>1, "PRIMER_PICK_RIGHT_PRIMER"=>1, "PRIMER_OPT_SIZE"=>18, "PRIMER_MIN_SIZE"=>15, "PRIMER_MAX_SIZE"=>21, "PRIMER_MAX_NS_ACCEPTED"=>1, "PRIMER_PRODUCT_SIZE_RANGE"=>75, "P3_FILE_FLAG"=>1, "SEQUENCE_INTERNAL_EXCLUDED_REGION"=>37, "PRIMER_EXPLAIN_FLAG"=>1, "PRIMER_THERMODYNAMIC_PARAMETERS_PATH"=>0, "PRIMER_LEFT_EXPLAIN"=>0, "PRIMER_RIGHT_EXPLAIN"=>0, "PRIMER_INTERNAL_EXPLAIN"=>0, "PRIMER_PAIR_EXPLAIN"=>0, "PRIMER_LEFT_NUM_RETURNED"=>0, "PRIMER_RIGHT_NUM_RETURNED"=>0, "PRIMER_INTERNAL_NUM_RETURNED"=>0, "PRIMER_PAIR_NUM_RETURNED"=>0, ""=>0}
Data source
SEQUENCE_ID=example
SEQUENCE_TEMPLATE=GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG
SEQUENCE_TARGET=37,21
PRIMER_TASK=pick_detection_primers
PRIMER_PICK_LEFT_PRIMER=1
PRIMER_PICK_INTERNAL_OLIGO=1
PRIMER_PICK_RIGHT_PRIMER=1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=21
PRIMER_MAX_NS_ACCEPTED=1
PRIMER_PRODUCT_SIZE_RANGE=75-100
P3_FILE_FLAG=1
SEQUENCE_INTERNAL_EXCLUDED_REGION=37,21
PRIMER_EXPLAIN_FLAG=1
PRIMER_THERMODYNAMIC_PARAMETERS_PATH=/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/
PRIMER_LEFT_EXPLAIN=considered 65, too many Ns 17, low tm 48, ok 0
PRIMER_RIGHT_EXPLAIN=considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35
PRIMER_INTERNAL_EXPLAIN=considered 0, ok 0
PRIMER_PAIR_EXPLAIN=considered 0, ok 0
PRIMER_LEFT_NUM_RETURNED=0
PRIMER_RIGHT_NUM_RETURNED=0
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=0
=
$ primer3_core < example2 | ruby /Users/sean/Dropbox/bin/rb/read_primer3.rb
Upvotes: 3
Views: 202
Reputation: 303215
For fun, here are a couple of purely-functional solutions. Both assume that you've already pulled your data from the file, e.g.
my_data = ARGF.read # read the file passed on the command line
This one feels sort of gross, but it is a (long) one-liner :)
hash = Hash[ my_data.lines.map{ |line|
line.chomp.split('=',2).map.with_index{ |s,i| i==0 ? s.downcase.to_sym : s }
} ]
This one is two lines, but feels cleaner than using with_index
:
keys,values = my_data.lines.map{ |line| line.chomp.split('=',2) }.transpose
hash = Hash[ keys.map(&:downcase).map(&:to_sym).zip(values) ]
Both of these are likely less efficient and certainly more memory-intense than your already-accepted answer; iterating the lines and slowly mutating your hash is the best way to go. These non-mutating variations are just a mental exercise.
Your final answer should use ARGF
to allow filenames on the command line or via STDIN. I would write it like so:
#!/usr/bin/ruby
module Primer3
def self.parse( file )
{}.tap do |primer3|
# Process one line at a time, without reading it all into memory first
file.each_line do |line|
key, value = line.chomp.split('=', 2)
primer3[key.downcase.to_sym] = value
end
end
end
end
Primer3.parse( ARGF ) if __FILE__==$0
This way you can either call the file from the command line, with or without STDIN, or you can require
this file and use the module function it defines in other code.
Upvotes: 4
Reputation: 14082
#!/usr/bin/ruby
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
key, value = line.split(/=/, 2)
primer3[key.downcase.to_sym] = value.chomp
end
puts primer3
Upvotes: 4
Reputation: 6381
OK I have it (almost). The only problem is it is adding a \n at the end of each value.
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
key, value = line.split(/\=/)
puts key
puts value
primer3[key.downcase] = value
end
puts primer3
{"sequence_id"=>"example\n", "sequence_template"=>"GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG\n", "sequence_target"=>"37,21\n", "primer_task"=>"pick_detection_primers\n", "primer_pick_left_primer"=>"1\n", "primer_pick_internal_oligo"=>"1\n", "primer_pick_right_primer"=>"1\n", "primer_opt_size"=>"18\n", "primer_min_size"=>"15\n", "primer_max_size"=>"21\n", "primer_max_ns_accepted"=>"1\n", "primer_product_size_range"=>"75-100\n", "p3_file_flag"=>"1\n", "sequence_internal_excluded_region"=>"37,21\n", "primer_explain_flag"=>"1\n", "primer_thermodynamic_parameters_path"=>"/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/\n", "primer_left_explain"=>"considered 65, too many Ns 17, low tm 48, ok 0\n", "primer_right_explain"=>"considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35\n", "primer_internal_explain"=>"considered 0, ok 0\n", "primer_pair_explain"=>"considered 0, ok 0\n", "primer_left_num_returned"=>"0\n", "primer_right_num_returned"=>"0\n", "primer_internal_num_returned"=>"0\n", "primer_pair_num_returned"=>"0\n", ""=>"\n"}
Upvotes: -1