Reputation: 4561
Given a vocabulary ["NY", "LA", "GA"]
,
how can one encode it in such a way that it becomes:
"NY" = 100
"LA" = 010
"GA" = 001
So if I do a lookup on "NY GA"
, I get 101
Upvotes: 3
Views: 198
Reputation: 24133
To create a lookup dictionary, reverse the vocabulary, enumerate it, and take the power of 2:
>>> vocabulary = ["NY", "LA", "GA"]
d = dict((word, 2 ** i) for i, word in enumerate(reversed(vocabulary)))
>>> d
{'NY': 4, 'GA': 1, 'LA': 2}
To query the dictionary:
>>> query = "NY GA"
>>> sum(code for word, code in d.iteritems() if word in query.split())
5
If you want it formatted to binary:
>>> '{0:b}'.format(5)
'101'
edit: if you want a 'one liner':
>>> '{0:b}'.format(
sum(2 ** i
for i, word in enumerate(reversed(vocabulary))
if word in query.split()))
'101'
edit2: if you want padding, for example with six 'bits':
>>> '{0:06b}'.format(5)
'000101'
Upvotes: 1
Reputation: 13459
Another solution using numpy. It looks like you're tyring to binary encode a dictionary, so the code below feels natural to me.
import numpy as np
def to_binary_representation(your_str="NY LA"):
xs = np.array(["NY", "LA", "GA"])
ys = 2**np.arange(3)[::-1]
lookup_table = dict(zip(xs,ys))
return bin(np.sum([lookup_table[k] for k in your_str.split()]))
It's also not needed to do it in numpy, but it is probably faster in case you have large arrays to work on. np.sum
can be replaced by the builtin sum
then and the xs
and ys
can be transformed to non-numpy equivalents.
Upvotes: 1
Reputation: 337
Or you can
vocabulary = ["NY","LA","GA"]
i=pow(10,len(vocabulary)-1)
dictVocab = dict()
for word in vocabulary:
dictVocab[word] = i
i /= 10
yourStr = "NY LA"
result = 0
for word in yourStr.split():
result += dictVocab[word]
Upvotes: 1
Reputation: 23480
vocab = ["NY", "LA", "GA"]
categorystring = '0'*len(vocab)
selectedVocabs = 'NY GA'
for sel in selectedVocabs.split():
categorystring = list(categorystring)
categorystring[vocab.index(sel)] = '1'
categorystring = ''.join(categorystring)
This is the end result of my won testing, turns out Python doesn't support string item assignment, somehow i thought it did.
Personally i think behzad's solution is better, numpy does a better job and is faster.
Upvotes: 1
Reputation: 77941
you can use numpy.in1d
:
>>> xs = np.array(["NY", "LA", "GA"])
>>> ''.join('1' if f else '0' for f in np.in1d(xs, 'NY GA'.split(' ')))
'101'
or:
>>> ''.join(np.where(np.in1d(xs, 'NY GA'.split(' ')), '1', '0'))
'101'
Upvotes: 1