Reputation: 698
Suppose we are given a String
"AABCD"
with length n = 5, from an alphabet
{'A', 'B', 'C', 'D', 'E', 'F'}
with dimension len(alphabet) = 6. What is a Pythonic way of converting this string to a 5 x 6 matrix?
ie.
#INPUT:
string = "AABCD"
alphabet = {'A', 'B', 'C', 'D', 'E', 'F'}
#OUTPUT
output =
A B C D E F
char 1[ 1 0 0 0 0 0 ]
char 2[ 1 0 0 0 0 0 ]
char 3[ 0 1 0 0 0 0 ]
char 4[ 0 0 1 0 0 0 ]
char 5[ 0 0 0 1 0 0 ]
I scoured other answers but have yet to find a question that is similar. Suggestions greatly appreciated!
Upvotes: 1
Views: 146
Reputation: 7204
Here's mine, it works with different size values too as shown:
df = pd.DataFrame(((pd.Series([*string])*len(alphabet)).str.split("", n=-1, expand=True).drop(columns=[0, len(alphabet)+1]).eq(list(sorted(alphabet)))*1)).rename(index=lambda x: f'Char {x+1}', columns=lambda x: f'{chr(x+64)}')
In [1661]: df
Out[1661]:
A B C D E F
Char 1 1 0 0 0 0 0
Char 2 1 0 0 0 0 0
Char 3 0 1 0 0 0 0
Char 4 0 0 1 0 0 0
Char 5 0 0 0 1 0 0
or
string = 'AABCDEEF'
alphabet = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}
df = pd.DataFrame(((pd.Series([*string])*len(alphabet)).str.split("", n=-1, expand=True).drop(columns=[0, len(alphabet)+1]).eq(list(sorted(alphabet)))*1)).rename(index=lambda x: f'Char {x+1}', columns=lambda x: f'{chr(x+64)}')
A B C D E F G H
Char 1 1 0 0 0 0 0 0 0
Char 2 1 0 0 0 0 0 0 0
Char 3 0 1 0 0 0 0 0 0
Char 4 0 0 1 0 0 0 0 0
Char 5 0 0 0 1 0 0 0 0
Char 6 0 0 0 0 1 0 0 0
Char 7 0 0 0 0 1 0 0 0
Char 8 0 0 0 0 0 1 0 0
Upvotes: 0
Reputation: 153460
You can use pandas a do this is very few lines:
import pandas as pd
string1 = "AABCD"
df = pd.Series([*string1]).str.get_dummies()
df = df.rename(index=lambda x: f'Char {x+1}')
print(df)
Output as pandas dataframe:
A B C D
Char 1 1 0 0 0
Char 2 1 0 0 0
Char 3 0 1 0 0
Char 4 0 0 1 0
Char 5 0 0 0 1
Note, a piece of syntactic sugar is the unpacking of a string into a list of characters using [*'string']
results in ['s','t','r','i','n','g']
.
Upvotes: 1
Reputation: 15872
For your exact output:
string = "AABCD"
alphabet = ['A', 'B', 'C', 'D', 'E', 'F']
print(f'output = \n\t{" ".join(alphabet)}')
for ix,char in enumerate(string, start=1):
x = [0]*len(alphabet)
x[alphabet.index(char)] = 1
print(f'char {ix} {x}'.replace(',',''))
Output:
output =
A B C D E F
char 1 [1 0 0 0 0 0]
char 2 [1 0 0 0 0 0]
char 3 [0 1 0 0 0 0]
char 4 [0 0 1 0 0 0]
char 5 [0 0 0 1 0 0]
Upvotes: 1
Reputation: 521
Another solution that is slightly neater and maybe more general:
import numpy as np
alphabet =["A","B","C","D","E","F"]
alphabet_dict = {}
for i,x in enumerate(alphabet):
alphabet_dict[x] = i
string = ["A", "A", "B", "C", "D"]
output = np.zeros((len(alphabet), len(string)))
for i,x in enumerate(string):
output[i][alphabet_dict[x]] = 1
Hope this helps.
Upvotes: 1
Reputation: 163
you can use this code:
string = "AABCD"
#use array insted set type
alphabet = ['A', 'B', 'C', 'D', 'E', 'F']
#global matrix
mat=[]
#get length of string to create one-hot vector for evry character
l=len(alphabet)
for i in string:
indx=alphabet.index(i)
sub=[0] * l
sub[indx]=1
mat.append(sub)
output :
[[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0]]
Upvotes: 1
Reputation: 20490
A simple double for loop will do
string = "AABCD"
alphabet = ['A', 'B', 'C', 'D', 'E', 'F']
matrix = [[0 for _ in range(len(alphabet))] for _ in range(len(string))]
for i, s in enumerate(string):
for j, a in enumerate(alphabet):
matrix[i][j] = 1 if s == a else 0
print(matrix)
The output will be
[
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0]
]
It can also be done via itertools.product, but it won't look as clean as the for loop.
import itertools
string = "AABCD"
alphabet = ['A', 'B', 'C', 'D', 'E', 'F']
string_iter = zip(list(range(len(string))), string)
alphabet_iter = zip(list(range(len(alphabet))), alphabet)
matrix = [[0 for _ in range(len(alphabet))] for _ in range(len(string))]
for (i, s), (j, a) in itertools.product(string_iter, alphabet_iter):
matrix[i][j] = 1 if s == a else 0
print(matrix)
Upvotes: 1