Reputation: 34523
our program creates a master hash where each key is a symbol representing an ID (about 10-20 characters). each value is an empty hash.
the master hash has about 800K records.
yet we're seeing ruby memory hit almost 400MB.
this suggests each key/value pair (symbol + empty hash) consumes ~500B each.
is this normal for ruby?
code below:
def load_app_ids
cols = get_columns AppFile
id_col = cols[:application_id]
each_record AppFile do |r|
@apps[r[id_col].intern] = {}
end
end
# Takes a line, strips the record seperator, and return
# an array of fields
def split_line(line)
line.gsub(RecordSeperator, "").split(FieldSeperator)
end
# Run a block on each record in a file, up to
# @limit records
def each_record(filename, &block)
i = 0
path = File.join(@dir, filename)
File.open(path, "r").each_line(RecordSeperator) do |line|
# Get the line split into columns unless it is
# a comment
block.call split_line(line) unless line =~ /^#/
# This import can take a loooong time.
print "\r#{i}" if (i+=1) % 1000 == 0
break if @limit and i >= @limit
end
print "\n" if i > 1000
end
# Return map of column name symbols to column number
def get_columns(filename)
path = File.join(@dir, filename)
description = split_line(File.open(path, &:readline))
# Strip the leading comment character
description[0].gsub!(/^#/, "")
# Return map of symbol to column number
Hash[ description.map { |str| [ str.intern, description.index(str) ] } ]
end
Upvotes: 1
Views: 389
Reputation: 27207
I would say this is normal for Ruby. I don't have metrics for space used by each data structure, but in general basic Ruby works poorly on this kind of large structure. It has to allow for the fact that the keys and values can be any kind of object for instance, and although that is very flexible for high-level coding, it's inefficient when you don't need such arbitrary control.
If I do this in irb
h = {}
800000.times { |x| h[("test" + x.to_s).to_sym] = {} }
I get a process with 197 Mb used.
Your process has claimed more space as it created large numbers of hashes during processing - one for each row. Ruby will eventually clean up - but that doesn't happen immediately, and the memory is not returned to the OS immediately either.
Edit: I should add that I have been working with large data structures of various kinds in Ruby - the general approach if you need them is to find something coded in native extensions (or ffi) where the code can take advantage of using restricted types in an array for example. The gem narray
is a good example of this for numeric arrays, vectors, matrices etc.
Upvotes: 1