Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Comparing bit representation of objects in Python

I am watching a video named The Mighty Dictionary which has the following code:

k1 = bits(hash('Monty'))
k2 = bits(hash('Money'))
diff = ('^' [a==b] for a,b in zip(k1,k2))
print(k1,k2,''.join(diff))

As I understand, bits is not a built-in method in Python, but his own written method which is similar to `format(x, 'b'), or is it something that existed in Python 2? (I've never wrote code in Python 2)

I've tried to accomplish the same, get the bits representation of the strings and check where the bits differ:

k1 = format(hash('Monty'),'b')
k2 = format(hash('Money'),'b')
diff = ('^ ' [a==b] for a,b in zip(k1,k2))
print(k1,'\n',k2,'\n',''.join(diff))

I do get the expected result:

UPDATED Had to shift the first line by 1 space to match the symbols

 110111010100001110100101100000100110111111110001001101111000110 
 -1000001111101001011101001010101101000111001011011000011110100 
 ^  ^^^  ^ ^^ ^^^   ^^^^^^^ ^ ^^^^^  ^^   ^^  ^^^^^^^ ^   ^ ^^^

Also, the lengths of the bits are not the same, whereas I understand that no matter the string, it will take the same, in my case, 64 bits? But its 63 and 62.

print(len(format(hash('Monty'),'b')))
print(len(format(hash('Money'),'b')))

63
62

So, to sum up my question:

  1. Is bits a built-in method in Python2?
  2. Is the recommended way to compare bit representation of an object is using the following:
def fn():
    pass

print(format(hash(fn),'b'))
# -111111111111111111111111111111111101111000110001011100000000101
  1. Shouldn't all objects have the same length of bits that represent the object depending on the processor? If I run the following code several times I get these results:
def fn():
    pass

def nf():
    pass

print(format(hash(fn),'b'))
print(format(hash(nf),'b'))

# first time
# 10001001010011010111110000100
# -111111111111111111111111111111111101110110101100101000001000001

# second time
# 10001001010011010111111101010
# 10001001010011010111110000100

# third time
# 10001001010011010111101010001
# -111111111111111111111111111111111101110110101100101000001000001

Upvotes: 0

Views: 177

Answers (1)

Barmar
Barmar

Reputation: 781804

  1. No, bits is not a built-in function in Python 2 or Python 3.
  2. By default format() doesn't show leading zeroes. Use the format string 032b to format the number in a 32-character field with leading zeroes.
>>> format(hash('Monty'), '032b')
'1001000100011010010110101101101011000010101011100110001010001'

Another problem you're running into is that hash() can return negative numbers. Maybe this couldn't happen in Python 2, or his bits() function shows the two's complement bits of the number. You can do this by normalizing the input:

def bits(n):
    if n < 0:
        n = 2**32 + n
    return format(n, '032b')
  1. Every time you run the code, you define new fn and nf functions. Different functions will not necessarily have the same hash code, even if they have the same name.

If you don't redefine the functions, you should get the same hash codes each time.

Hashing strings and numbers just depends on the contents, but hashing more complex objects depends on the specific instance.

Upvotes: 1

Related Questions