Reputation: 43
I have written a one hot encoding program and the current output is separate lists (which are generated through the for
loop below):
onehot_encoded = list()
for value in integer_encoded:
base = [0 for x in range(len(bases))]
base[value] = 1
onehot_encoded.extend(base)
print(onehot_encoded)
So far the example output looks like this, where one base is encoded per list:
[0, 1, 0, 0, 1, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 1, 0]
Whereas I would like it to be written into one list of lists like so:
[[[0, 1, 0, 0, 1, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 1, 0]]
I have tried to create a list to which onehot_encoded
outputs would be appended, but this does not work:
masterlist = list()
onehot_encoded = list()
for value in integer_encoded:
base = [0 for x in range(len(bases))]
base[value] = 1
onehot_encoded.extend(base)
masterlist.append(onehot_encoded)
print(masterlist)
I would really appreciate any help in identifying where I am going wrong. I am a beginner in Python, and am finding it hard to identify the flaw in logic here.
EDIT: bases = "ACTG"
, so each base needs 4 integers to be encoded, e.g."AG" would be [1, 0, 0, 0, 0, 0, 0, 1]
.
integer_encoded
is an earlier piece of code, where bases
is enumerated, so the input sequence is encoded as integers, for example: "AG" in this case would be "0, 3"
Upvotes: 0
Views: 62
Reputation: 15561
To get a list of lists, you likely need two nested loops (there are ways around it, but likely inconvenient).
EDIT: You did not show a specific case where you specify the input for your snippet, and the corresponding expected output.
Assuming from your comment (which is not complete but helps in this respect),
Input:
bases = "ACTG"
integer_encoded = [ "CA", "CT" ]
Expected output:
[[0, 1, 0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1, 0]]
This code produces the result:
bases = "ACTG"
#integer_encoded_pairs = [ "CA", "CT" ]
integer_encoded_pairs = [ "10", "12" ]
masterlist = list()
for integer_encoded in integer_encoded_pairs:
onehot_encoded = list()
for value in integer_encoded:
base = [0 for x in range(len(bases))]
base[int(value)] = 1
onehot_encoded.extend(base)
masterlist.append(onehot_encoded)
print(masterlist)
Upvotes: 0
Reputation: 867
Try this?
#sample data
bases = ["AG", "A"]
integer_encoded = [[0,3],[0]]
masterlist = list()
for encode in integer_encoded:
onehot_encoded = list()
for value in encode:
base = [0 for x in range(4)]
base[value] = 1
onehot_encoded.extend(base)
masterlist.append(onehot_encoded)
print(masterlist)
will give you the required output [[1, 0, 0, 0, 0, 0, 0, 1], [1, 0, 0, 0]]
Upvotes: 0
Reputation: 706
try using append
instead of extend
onehot_encoded.append(base)
extend
enlarges initial array with items from base
while append
puts base
as it is
Upvotes: 1