John Dibling
John Dibling

Reputation: 101476

Constructing array of disparate types based on element in other array

How do I construct an array of different types given a comma-separated string and another array dictating the type?


By parsing CSV input taken from stdin, I have an array of column header Symbols:

cols = [:IndexSymbol, :PriceStatus, :UpdateExchange, :Last]

and a line of raw input:

raw = "$JX.T.CA,Open,T,933.36T 11:10:00.000"

I would like to construct an an array, cells from the raw input, where each element of cells is a type identified by the corresponding element in cols. What are the idiomatic Ruby-sh ways of doing this?


I have tried this, which works but doesn't really feel right.

1) First, define a class for each type which needs to be encapsulated:

class Sku
  attr_accessor :mRoot, :mExch,, :mCountry
  def initialize(root, exch, country)
    @mRoot = root
    @mExch = exch
    @mCountry = country
  end
end

class Price
  attr_accessor :mPrice, :mExchange, :mTime
  def initialize(price, exchange, time)
    @mPrice = price
    @mExchange = exchange
    @mTime = time
  end
end

2) Then, define conversion functions for each unique column type which needs to be converted:

def to_sku(raw)
  raw.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
end

def to_price(raw)

end

3) Create an array of strings from the input:

cells = raw.split(",")

4) And finally modify each element of cells in-place by constructing the type dictated by the corresponding column header:

cells.each_index do |i|
    cells[i] = case cols[i]
        when :IndexSymbol
            to_sku(cells[i])
        when :PriceStatus
            cells[i].split(";").collect {|st| st.to_sym}
        when :UpdateExchange
            cells[i]
        when :Last
            cells[i].match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
        else
            puts "Unhandled column type (#{cols[i]}) from input string: \n#{cols}\n#{raw}"
            exit -1
    end
end

The parts that don't feel right are steps 3 and 4. How is this done in a more Ruby fashion? I was imagining some kind of super concise method like this, which exists only in my imagination:

cells = raw.split_using_convertor(",")

Upvotes: 2

Views: 134

Answers (4)

Abe Voelker
Abe Voelker

Reputation: 31594

You could have the different types inherit from a base class and put the lookup knowledge in that base class. Then you could have each class know how to initialize itself from a raw string:

class Header
  @@lookup = {}

  def self.symbol(*syms)
    syms.each{|sym| @@lookup[sym] = self}
  end

  def self.lookup(sym)
    @@lookup[sym]
  end
end

class Sku < Header
  symbol :IndexSymbol
  attr_accessor :mRoot, :mExch, :mCountry

  def initialize(root, exch, country)
    @mRoot = root
    @mExch = exch
    @mCountry = country
  end

  def to_s
    "@#{mRoot}-#{mExch}-#{mCountry}"
  end

  def self.from_raw(str)
    str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
  end
end

class Price < Header
  symbol :Last, :Bid
  attr_accessor :mPrice, :mExchange, :mTime

  def initialize(price, exchange, time)
    @mPrice = price
    @mExchange = exchange
    @mTime = Time.new(time)
  end

  def to_s
    "$#{mPrice}-#{mExchange}-#{mTime}"
  end

  def self.from_raw(raw)
    raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
  end
end

class SymbolList
  symbol :PriceStatus
  attr_accessor :mSymbols

  def initialize(symbols)
    @mSymbols = symbols
  end

  def self.from_raw(str)
    new(str.split(";").map(&:to_sym))
  end

  def to_s
    mSymbols.to_s
  end
end

class ExchangeIdentifier
  symbol :UpdateExchange
  attr_accessor :mExch

  def initialize(exch)
    @mExch = exch
  end

  def self.from_raw(raw)
    new(raw)
  end

  def to_s
    mExch
  end
end

Then you can replace step #4 like so (CSV parsing not included):

cells.each_index.map do |i|
  Header.lookup(cols[i]).from_raw(cells[i])
end

Upvotes: 2

John Dibling
John Dibling

Reputation: 101476

@AbeVoelker's answer steered me in the right direction, but I had to make a pretty major change because of something I failed to mention in the OP.

Some of the cells will be of the same type, but will still have different semantics. Those semantic differences don't come in to play here (and aren't elaborated on), but they do in the larger context of the tool I'm writing.

For example, there will be several cells that are of type Price; some of them are :Last, ':Bid, and :Ask. They are all the same type (Price), but they are still different enough so that there can't be a single Header@@lookup entry for all Price columns.

So what I actually did was write a self-decoding class (credit to Abe for this key part) for each type of cell:

class Sku
    attr_accessor :mRoot, :mExch, :mCountry
    def initialize(root, exch, country)
        @mRoot = root
        @mExch = exch
        @mCountry = country
    end

    def to_s
        "@#{mRoot}-#{mExch}-#{mCountry}"
    end

    def self.from_raw(str)
        str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
    end
end

class Price
    attr_accessor :mPrice, :mExchange, :mTime
    def initialize(price, exchange, time)
        @mPrice = price
        @mExchange = exchange
        @mTime = Time.new(time)
    end
    def to_s
        "$#{mPrice}-#{mExchange}-#{mTime}"
    end
    def self.from_raw(raw)
        raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
    end
end

class SymbolList
    attr_accessor :mSymbols
    def initialize(symbols)
        @mSymbols = symbols
    end
    def self.from_raw(str)
        new(str.split(";").collect {|s| s.to_sym})
    end
    def to_s
        mSymbols.to_s
    end
end

class ExchangeIdentifier
    attr_accessor :mExch
    def initialize(exch)
        @mExch = exch
    end
    def self.from_raw(raw)
        new(raw)
    end
    def to_s
        mExch
    end
end

...Create a typelist, mapping each column identifier to the type:

ColumnTypes =
{
    :IndexSymbol => Sku,
    :PriceStatus => SymbolList,
    :UpdateExchange => ExchangeIdentifier,
    :Last => Price,
    :Bid => Price
}

...and finally construct my Array of cells by calling the appropriate type's from_raw:

cells = raw.split(",").each_with_index.collect { |cell,i|
    puts "Cell: #{cell}, ColType: #{ColumnTypes[cols[i]]}"
    ColumnTypes[cols[i]].from_raw(cell)
}

The result is code that is clean and expressive in my eyes, and seems more Ruby-ish that what I had originally done.

Complete example here.

Upvotes: 1

matt
matt

Reputation: 79783

Ruby’s CSV library includes support for this sort of thing directly (as well as better handling of the actual parsing), although the docs are a bit awkward.

You need to provide a proc that will do your conversions for you, and pass it as an option to CSV.parse:

converter = proc do |field, info|
  case info.header.strip # in case you have spaces after your commas
  when "IndexSymbol"
      field.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
  when "PriceStatus"
      field.split(";").collect {|st| st.to_sym}
  when "UpdateExchange"
      field
  when "Last"
      field.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
  end
end

Then you can parse it almost directly into the format you want:

c =  CSV.parse(s, :headers => true, :converters => converter).by_row!.map do |row|
  row.map { |_, field| f }  #we only want the field now, not the header
end

Upvotes: 1

Rory O&#39;Kane
Rory O&#39;Kane

Reputation: 30408

You can make the fourth step simpler with #zip, #map, and destructuring assignment:

cells = cells.zip(cols).map do |cell, col|
    case col
    when :IndexSymbol
        to_sku(cell)
    when :PriceStatus
        cell.split(";").collect {|st| st.to_sym}
    when :UpdateExchange
        cell
    when :Last
        cell.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
    else
        puts "Unhandled column type (#{col}) from input string: \n#{cols}\n#{raw}"
        exit -1
    end
end

I wouldn’t recommend combining that step with the splitting, because parsing a line of CSV is complicated enough to be its own step. See my comment for how to parse the CSV.

Upvotes: 2

Related Questions