port5432
port5432

Reputation: 6381

Split a complex file into a hash

I am running a command line program, called Primer 3. It takes an input file and returns data to standard output. I am trying to write a Ruby script which will accept that input, and put the entries into a hash.

The results returned are below. I would like to split the data on the '=' sign, so that the has would something like this:

{:SEQUENCE_ID => "example", :SEQUENCE_TEMPLATE => "GTAGTCAGTAGACNAT..etc", :SEQUENCE_TARGET => "37,21" etc }

I would also like to lower case the keys, ie:

 {:sequence_id => "example", :sequence_template => "GTAGTCAGTAGACNAT..etc", :sequence_target => "37,21" etc }

This is my current script:

#!/usr/bin/ruby
puts 'Primer 3 hash'

primer3 = {}
while line = gets do
  name, height = line.split(/\=/)
  primer3[name] = height.to_i
end

puts primer3

It is returning this:

Primer 3 hash
{"SEQUENCE_ID"=>0, "SEQUENCE_TEMPLATE"=>0, "SEQUENCE_TARGET"=>37, "PRIMER_TASK"=>0,     "PRIMER_PICK_LEFT_PRIMER"=>1, "PRIMER_PICK_INTERNAL_OLIGO"=>1,  "PRIMER_PICK_RIGHT_PRIMER"=>1, "PRIMER_OPT_SIZE"=>18, "PRIMER_MIN_SIZE"=>15, "PRIMER_MAX_SIZE"=>21, "PRIMER_MAX_NS_ACCEPTED"=>1, "PRIMER_PRODUCT_SIZE_RANGE"=>75, "P3_FILE_FLAG"=>1, "SEQUENCE_INTERNAL_EXCLUDED_REGION"=>37, "PRIMER_EXPLAIN_FLAG"=>1, "PRIMER_THERMODYNAMIC_PARAMETERS_PATH"=>0, "PRIMER_LEFT_EXPLAIN"=>0, "PRIMER_RIGHT_EXPLAIN"=>0, "PRIMER_INTERNAL_EXPLAIN"=>0, "PRIMER_PAIR_EXPLAIN"=>0, "PRIMER_LEFT_NUM_RETURNED"=>0, "PRIMER_RIGHT_NUM_RETURNED"=>0, "PRIMER_INTERNAL_NUM_RETURNED"=>0, "PRIMER_PAIR_NUM_RETURNED"=>0, ""=>0}

Data source

SEQUENCE_ID=example
SEQUENCE_TEMPLATE=GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG
SEQUENCE_TARGET=37,21
PRIMER_TASK=pick_detection_primers
PRIMER_PICK_LEFT_PRIMER=1
PRIMER_PICK_INTERNAL_OLIGO=1
PRIMER_PICK_RIGHT_PRIMER=1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=21
PRIMER_MAX_NS_ACCEPTED=1
PRIMER_PRODUCT_SIZE_RANGE=75-100
P3_FILE_FLAG=1
SEQUENCE_INTERNAL_EXCLUDED_REGION=37,21
PRIMER_EXPLAIN_FLAG=1
PRIMER_THERMODYNAMIC_PARAMETERS_PATH=/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/
PRIMER_LEFT_EXPLAIN=considered 65, too many Ns 17, low tm 48, ok 0
PRIMER_RIGHT_EXPLAIN=considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35
PRIMER_INTERNAL_EXPLAIN=considered 0, ok 0
PRIMER_PAIR_EXPLAIN=considered 0, ok 0
PRIMER_LEFT_NUM_RETURNED=0
PRIMER_RIGHT_NUM_RETURNED=0
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=0
=

$ primer3_core < example2 | ruby /Users/sean/Dropbox/bin/rb/read_primer3.rb

Upvotes: 3

Views: 202

Answers (3)

Phrogz
Phrogz

Reputation: 303215

For fun, here are a couple of purely-functional solutions. Both assume that you've already pulled your data from the file, e.g.

my_data = ARGF.read # read the file passed on the command line

This one feels sort of gross, but it is a (long) one-liner :)

hash = Hash[ my_data.lines.map{ |line|
  line.chomp.split('=',2).map.with_index{ |s,i| i==0 ? s.downcase.to_sym : s }
} ]

This one is two lines, but feels cleaner than using with_index:

keys,values = my_data.lines.map{ |line| line.chomp.split('=',2) }.transpose
hash = Hash[ keys.map(&:downcase).map(&:to_sym).zip(values) ]

Both of these are likely less efficient and certainly more memory-intense than your already-accepted answer; iterating the lines and slowly mutating your hash is the best way to go. These non-mutating variations are just a mental exercise.


Your final answer should use ARGF to allow filenames on the command line or via STDIN. I would write it like so:

#!/usr/bin/ruby

module Primer3
  def self.parse( file )
    {}.tap do |primer3|
      # Process one line at a time, without reading it all into memory first
      file.each_line do |line|  
        key, value = line.chomp.split('=', 2)
        primer3[key.downcase.to_sym] = value
      end
    end
  end
end

Primer3.parse( ARGF ) if __FILE__==$0

This way you can either call the file from the command line, with or without STDIN, or you can require this file and use the module function it defines in other code.

Upvotes: 4

Arie Xiao
Arie Xiao

Reputation: 14082

#!/usr/bin/ruby
puts 'Primer 3 hash'

primer3 = {}
while line = gets do
  key, value = line.split(/=/, 2)
  primer3[key.downcase.to_sym] = value.chomp
end

puts primer3

Upvotes: 4

port5432
port5432

Reputation: 6381

OK I have it (almost). The only problem is it is adding a \n at the end of each value.

puts 'Primer 3 hash'

primer3 = {}
while line = gets do
  key, value = line.split(/\=/)
  puts key
  puts value
  primer3[key.downcase] = value
end

puts primer3

{"sequence_id"=>"example\n",  "sequence_template"=>"GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG\n", "sequence_target"=>"37,21\n", "primer_task"=>"pick_detection_primers\n", "primer_pick_left_primer"=>"1\n", "primer_pick_internal_oligo"=>"1\n", "primer_pick_right_primer"=>"1\n", "primer_opt_size"=>"18\n", "primer_min_size"=>"15\n", "primer_max_size"=>"21\n", "primer_max_ns_accepted"=>"1\n", "primer_product_size_range"=>"75-100\n", "p3_file_flag"=>"1\n", "sequence_internal_excluded_region"=>"37,21\n", "primer_explain_flag"=>"1\n", "primer_thermodynamic_parameters_path"=>"/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/\n", "primer_left_explain"=>"considered 65, too many Ns 17, low tm 48, ok 0\n", "primer_right_explain"=>"considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35\n", "primer_internal_explain"=>"considered 0, ok 0\n", "primer_pair_explain"=>"considered 0, ok 0\n", "primer_left_num_returned"=>"0\n", "primer_right_num_returned"=>"0\n", "primer_internal_num_returned"=>"0\n", "primer_pair_num_returned"=>"0\n", ""=>"\n"}

Upvotes: -1

Related Questions