agatha
agatha

Reputation: 43

How to write lists generated through for loop into one list?

I have written a one hot encoding program and the current output is separate lists (which are generated through the for loop below):

onehot_encoded = list()
for value in integer_encoded:
    base = [0 for x in range(len(bases))]
    base[value] = 1
    onehot_encoded.extend(base)
print(onehot_encoded)

So far the example output looks like this, where one base is encoded per list:

[0, 1, 0, 0, 1, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 1, 0]

Whereas I would like it to be written into one list of lists like so:

[[[0, 1, 0, 0, 1, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 1, 0]]

I have tried to create a list to which onehot_encoded outputs would be appended, but this does not work:

masterlist = list()    
onehot_encoded = list()
for value in integer_encoded:
    base = [0 for x in range(len(bases))]
    base[value] = 1
    onehot_encoded.extend(base)
masterlist.append(onehot_encoded)
print(masterlist)

I would really appreciate any help in identifying where I am going wrong. I am a beginner in Python, and am finding it hard to identify the flaw in logic here.

EDIT: bases = "ACTG", so each base needs 4 integers to be encoded, e.g."AG" would be [1, 0, 0, 0, 0, 0, 0, 1]. integer_encoded is an earlier piece of code, where bases is enumerated, so the input sequence is encoded as integers, for example: "AG" in this case would be "0, 3"

Upvotes: 0

Views: 62

Answers (3)

To get a list of lists, you likely need two nested loops (there are ways around it, but likely inconvenient).

EDIT: You did not show a specific case where you specify the input for your snippet, and the corresponding expected output. Assuming from your comment (which is not complete but helps in this respect),
Input:

bases = "ACTG"
integer_encoded = [ "CA", "CT" ]

Expected output:

[[0, 1, 0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1, 0]]

This code produces the result:

bases = "ACTG"
#integer_encoded_pairs = [ "CA", "CT" ]
integer_encoded_pairs = [ "10", "12" ]

masterlist = list()
for integer_encoded in integer_encoded_pairs:
    onehot_encoded = list()
    for value in integer_encoded:
        base = [0 for x in range(len(bases))]
        base[int(value)] = 1
        onehot_encoded.extend(base)
    masterlist.append(onehot_encoded)
print(masterlist)

Upvotes: 0

astrosyam
astrosyam

Reputation: 867

Try this?

#sample data
bases = ["AG", "A"]
integer_encoded = [[0,3],[0]]

masterlist = list()  

for encode in integer_encoded:  
    onehot_encoded = list()
    for value in encode:
        base = [0 for x in range(4)]
        base[value] = 1
        onehot_encoded.extend(base)
    masterlist.append(onehot_encoded)

print(masterlist) will give you the required output [[1, 0, 0, 0, 0, 0, 0, 1], [1, 0, 0, 0]]

Upvotes: 0

alexey
alexey

Reputation: 706

try using append instead of extend

    onehot_encoded.append(base)

extend enlarges initial array with items from base while append puts base as it is

Upvotes: 1

Related Questions