Xavi Pi
Xavi Pi

Reputation: 91

Transform string into a bit array

How could I create a Python function that takes

and returns

So if a word has an 'A' or 'a' and the first spot in the array corresponds to 'a', then the output array would have a 1 in its first spot:

[1, ...]

If a word has a 'B' or 'b', then the output array would have a 1 for its second spot. If a word has an 'a' and a 'b', then the output array would have a 1 in the first and second spots:

[1, 1, ...]

And so on. So the string "abba" would result in something like this:

[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Ideally, I would also be able to search for characters not in the alphabet, like ! and ?, too, and just add other bits to the array to represent those characters.

Any help would be welcome! Thanks a ton.

Upvotes: 2

Views: 124

Answers (2)

mad_
mad_

Reputation: 8273

Why not create a simple mapping dictionary

import string
alphabet=string.ascii_lowercase
d=dict(zip(alphabet,range(0,26)))
a=[0]*26

The dictionary will look like this

{'a': 0,
 'b': 1,
 'c': 2,
 'd': 3,
 'e': 4,
 'f': 5,
 'g': 6,
 'h': 7,
 'i': 8,
 'j': 9,
 'k': 10,
 'l': 11,
 'm': 12,
 'n': 13,
 'o': 14,
 'p': 15,
 'q': 16,
 'r': 17,
 's': 18,
 't': 19,
 'u': 20,
 'v': 21,
 'w': 22,
 'x': 23,
 'y': 24,
 'z': 25}

Logic for lookup and updating the list

for i in set('aabbc?'):
    index_to_update=d.get(i,None)
    if index_to_update is not None:
        a[index_to_update]=1
print(a)#[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Upvotes: 2

finefoot
finefoot

Reputation: 11282

A very simple way to create such a list is:

def string_to_bit_array(text):
    # We don't care if upper or lower case
    text = text.lower()
    # Remove duplicate alphabet characters
    text = set(text)
    # Define alphabet characters
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    # Create list with zeros
    matches = [0] * len(alphabet)
    # Loop over every character of the text
    for character in text:
        # Skip this character if not in alphabet
        if not character in alphabet:
            continue
        # Find index of character in alphabet
        index = alphabet.find(character)
        # Set match index to one instead of zero
        matches[index] = 1
    # Return result
    return matches

print(string_to_bit_array("abba"))

This prints:

[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

You can just add further characters to alphabet if you need them:

alphabet = "abcdefghijklmnopqrstuvwxyz!?"

Upvotes: 2

Related Questions