Reputation: 108
I'm working on a project involving BitTorrent, where I receive an bitfield as a python string. For example:
bitfield = "000001110100111000110101100010"
I would like to be able to convert the python string into a format so it can be inserted as is into a varbinary(max) column of a MSSQL database using PYODBC. If I try to insert it as a string as is, it of course complains of an illegal conversion error.
Note PYODBC, expects a byte array or buffer as input for a varbinary field according to their documentation.
Any suggestions would be appreciated.
Upvotes: 2
Views: 1040
Reputation: 128
Before I get to code, I'd like to make one recommendation: The 'bitfield' value isn't a length that can be divided into bytes. I'd suggest that anytime you're dealing with bit strings, you grow them in sizes of bytes (e.g. if len(bitfield)%8 != 0: print 'Make sure the bitfield can be fully represented by bytes!') to ensure there is no ambiguity in how fields are manipulated in different programming languages, different libraries within programming languages, and different databases. In other words, the database, python, the library I'm going to recommend, etc. are all going to either store or be able to represent this bitarray in the form of a byte array. If the bitarray provided doesn't divide evenly into bytes, one of three things will happen: 1) An error will be raised, (this is optimistic) 2) The bitarray will be auto-magically left padded. 3) The bitarray will be auto-magically right padded.
I'd recommend using a bitstring library of some sort. I have used python-bitstring for this purpose. I didn't take the time to deal with ODBC here, but the idea is basically the same, and leverages srgerg's answer:
Examples:
#!/usr/bin/python
import pymssql
from binascii import hexlify
from bitstring import BitArray
dbconninfo = {'host': 'hostname', 'user': 'username', 'password': 'secret', 'database': 'bitexample', 'as_dict': True}
conn = pymssql.connect(**dbconninfo)
cursor = conn.cursor()
bitfield = "000001110100111000110101100010"
ba = BitArray(bin=bitfield)
print '%32d (bitfield -> BitArray -> int)' % ba.int
cursor.execute("CREATE TABLE bin_test (bin_col varbinary(max) )")
cursor.execute("INSERT INTO bin_test values (%s)", (ba.int,))
cursor.execute("SELECT bin_col FROM bin_test")
results = cursor.fetchone()['bin_col'] # results now contains binary packed data '\x01\xd3\x8db'
conn.rollback()
results_int = int(hexlify(results),16)
print '%32d (bitfield -> BitArray -> int -> DB (where data is binary packed) -> unpacked with hexlify -> int)' % results_int
print '%32s (Original bitfield)' % bitfield
from_db_using_ba_hexlify_and_int_with_length = BitArray(int=int(hexlify(results),16), length=30).bin
print '%32s (From DB, decoded with hexlify, using int to instantiate BitArray, specifying length of int as 30 bits, out as bin)' %
from_db_using_ba_hexlify_and_int_with_length
from_db_using_ba_hex = BitArray(hex=hexlify(results)).bin # Can't specify length with hex
print '%32s (From DB, decoded with hexlify, using hex to instantiate BitArray, can not specify length, out as bin)' % from_db_using_ba_hex
from_db_using_ba_bytes_no_length = BitArray(bytes=results).bin # Can specify length with bytes... that's next.
print '%32s (From DB, using bytes to instantiate BitArray, no length specified, out as bin)' % from_db_using_ba_bytes_no_length
from_db_using_ba_bytes = BitArray(bytes=results,length=30).bin
print '%32s (From DB, using bytes to instantiate BitArray, specifying length of bytes as 30 bits, out as bin)' % from_db_using_ba_bytes
from_db_using_hexlify_bin = bin(int(hexlify(results),16))
print '%32s (from DB, decoded with hexlify -> int -> bin)' % from_db_using_hexlify_bin
from_db_using_hexlify_bin_ba = BitArray(bin=bin(int(hexlify(results),16))).bin
print '%32s (from DB, decoded with hexlify -> int -> bin -> BitArray instantiated with bin)' % from_db_using_hexlify_bin
from_db_using_bin = bin(int(results,16))
print '%32s (from DB, no decoding done, using bin)' % from_db_using_bin
The output of this is:
30641506 (bitfield -> BitArray -> int)
30641506 (bitfield -> BitArray -> int -> DB (where data is binary packed) -> unpacked with hexlify -> int)
000001110100111000110101100010 (Original bitfield)
000001110100111000110101100010 (From DB, decoded with hexlify, using int to instantiate BitArray, specifying length of int as 30 bits, out as bin)
00000001110100111000110101100010 (From DB, decoded with hexlify, using hex to instantiate BitArray, can not specify length, out as bin)
00000001110100111000110101100010 (From DB, using bytes to instantiate BitArray, no length specified, out as bin)
000000011101001110001101011000 (From DB, using bytes to instantiate BitArray, specifying length of bytes as 30 bits, out as bin)
0b1110100111000110101100010 (from DB, decoded with hexlify -> int -> bin)
0b1110100111000110101100010 (from DB, decoded with hexlify -> int -> bin -> BitArray instantiated with bin)
Traceback (most recent call last):
File "./bitexample.py", line 38, in <module>
from_db_using_bin = bin(int(results,16))
ValueError: invalid literal for int() with base 16: '\x01\xd3\x8db'
Note that since you don't have a bitstring that can be directly broken down into bytes (it's a string that represents 30 bits), the only way to get the exact same string was to specify a length, and even then the results were not consistent depending on how the BitArray was instantiated.
Upvotes: 2
Reputation: 19339
Assuming you're using a recent version of python you can take advantage of the standard library struct
module and the bin
function. Here's a quick example:
con = pyodbc.connect("...")
con.execute("CREATE TABLE bin_test ( bin_col varbinary(max) )")
con.execute("INSERT INTO bin_test VALUES (?)",
(int("000001110100111000110101100010", 2),))
result = con.execute("SELECT * FROM bin_test").fetchone()
bin(struct.unpack(">I", result[0])[0])
The result of the final statement is
'0b1110100111000110101100010'
which is the initial bitfield (with the leading zeroes removed).
You can find the documentation for the struct module on docs.python.org. The documentation for the bin function is also available at the same place.
Upvotes: 2