Reputation: 569
I need to pull down the result of git ls-remote into an array, then convert that array to a hash like this: {commit_hash => reference}. Occasionally, two commit hashes are identical (but have different references, perhaps). So I get this kind of thing:
["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
"8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",
"3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
"d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
"19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
"906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"]
which I want to convert to:
{"19d97e408ee3f993745b053e281ac9dc69519e06" => "refs/heads/auto"...}
but master and auto have the same hash, so one of them gets dropped in the conversion.
How do I either 1.) get a list of the values which were dropped in the conversion, or 2.) make the keys unique by adding a special character to the key, like a *?
Upvotes: 2
Views: 2044
Reputation: 110675
You gave two options for what you want to do:
I think the second approach is a bad idea, for a couple of reasons: a) you would have to have a method of modifying the key that would allow for the possibility of their being multiple duplicates; and b) making connections between the original and the duplicates would be awkward. Also, it would be just plain ugly.
I see others have suggested a third possibility: changing the form of the resulting hash, so that values arrays of strings. That might serve you well, but it is not what you asked for, so I chose to build a list of the values that are dropped; i.e., all but the first.
Code
def create_hash_and_save_extras(arr)
arr.each_slice(2).with_object([{},[]]) { |(k,v),(h,ex)|
h.update({k=>v}) { |k, ov, nv| ex << {k=>nv}; ov } }
end
Example
create_hash_and_save_extras(arr)
#=> [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
# "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
# "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
# "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1",
# "906dfe6eebff832baf0f92683d751432fcc98ab7"=>"refs/heads/regression"},
# [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/master"}]]
Explanation
Enumerable#each_slice sent to arr
returns an enumerator:
enum1 = arr.each_slice(2)
#=> #<Enumerator: [
# "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
# "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
# ...
# "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
# ]:each_slice(2)>
Enumerator#with_object creates an array consisting of and initially-empty hash (represented by the block variable h
) and an initially-empty array (for the "extras"), represented by the block variable ex
, which is then sent to enum1
to create another enumerator (which you can think of as a "compound enumerator"--note the reference to each_slice(2)>:with_object({})
below).
enum2 = enum1.with_object([{},[]])
#=> #<Enumerator: #<Enumerator: [
# "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
# "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
# ...
# "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
# ]:each_slice(2)>:with_object([{},[])>
We can convert enum2
to an array to see what it will be passing into its block:
enum2.to_a
#=> [[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"],
# [{}, []]],
# [["8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks"],
# [{}, []]],
# [["3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8", "refs/heads/elab"],
# [{}, []]],
# [["d38a9a26ef887c08b306bdab210b39882f58e587", "refs/heads/elab_6.1"],
# [{}, []]],
# [["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/master"],
# [{}, []]],
# [["906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"],
# [{}, []]],
The first element that enum2
passes into its block is
[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"], [{}, []]]]]
The block variables are therefore assigned as follows:
k => "19d97e408ee3f993745b053e281ac9dc69519e06"
v => "refs/heads/auto"
h => {}
ex = []
We now use Hash#update (aka Hash#merge!
) to merge {k,v}
into h
(h
initially being empty.) Therefore
h.update({k=>v}) { |k, ov, nv| extras << {k=>nv}; ov }
becomes
h.update({"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"})
followed by the block
{ |k, ov, nv| ex << {k=>nv}; ov }
but the block only applies when the hash merged hash (h
) and the hash being merged (update
's argument) share the same key k
, in which case ov
and nv
are the values associated with those keys for h
and the hash being merged, respectively. The merged value for key k
will be whatever is returned by the block. Yes, that will apply when we encounter duplicates.
So now
h #=> {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"}
We continue in this way for each of the other elements of enum2
. When we encounter
k = "19d97e408ee3f993745b053e281ac9dc69519e06"
v = "refs/heads/master"
h = {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
"8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
"3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
"d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1"}
we find that k
is already in the merged hash h
, so the block is evaluated to determine the value of k
in the merged hash h
. We want to keep the current value h[k]
, which is ov
, so that is what the block returns. First, however, we append the (still empty) array ex
with the duplicate value, expressed as a hash.
ex << {"19d97e408ee3f993745b053e281ac9dc69519e06" => "refs/heads/master"}
Upvotes: 2
Reputation: 118271
I hope you would like this :
ary = [
"19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
"8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",
"3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
"d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
"19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
"906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
]
array_hash = ary.each_slice(2).with_object(Hash.new { |h,k| h[k] = []}) do |(k,v),hash|
hash[k] << v
end
# the main advantage is here you wouldn't loose any data, all are with you. You can
# use it as per your need. I think it is a better approach to deal with your situation.
array_hash
# => {"19d97e408ee3f993745b053e281ac9dc69519e06"=>
# ["refs/heads/auto", "refs/heads/master"],
# "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>["refs/heads/callout_hooks"],
# "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>["refs/heads/elab"],
# "d38a9a26ef887c08b306bdab210b39882f58e587"=>["refs/heads/elab_6.1"],
# "906dfe6eebff832baf0f92683d751432fcc98ab7"=>["refs/heads/regression"]}
Upvotes: 9
Reputation: 23939
If you make a hash of hash_value => array of refs, you'll keep everything:
array = ["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
"8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",
"3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
"d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
"19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
"906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
]
array.each_slice(2).reduce({}) { |h, (k, v)| (h[k] ||= []) << v; h }
Looks like Arup and I were thinking the same way...
Upvotes: 5