rex
rex

Reputation: 47

Extracting specific elements from a list of strings and creating a new list?

I am a beginner in python.

This my issue. I have a list as below

lst = ['UGAGGUAGUAGGUUGUAUAGUU', 'CUAUGCAAUUUUCUACCUUACC', 'UCCCUGAGACCUCAAGUGUGA',
       'ACACCUGGGCUCUCCGGGUACC', 'CAUACUUCCUUACAUGCCCAUA', 'UGGAAUGUAAAGAAGUAUGUA',
       'CAUCAAAGCGGUGGUUGAUGUG', 'UAUCACAGCCAGCUUUGAUGUGC', 'AGGCAGUGUGGUUAGCUGGUUG',
       'ACGGCUACCUUCACUGCCACCC']

Now I need to extract the first letter from all the 10 elements in the lst and then put them in a new list. similarly second letter, third letter and so forth until the last letter is extracted from all the ten elements and append it to the new list. The output has to look like this

new_lst = ['UCUACUCUAA', 'GUCCAGAAGC', 'AACAUGUUGG', 'GUCCAACCCG', 'GGUCCAAAAC',
           'UCGUUUACGU', 'AAAGUGAAUA', 'GAGGCUGGGC', 'UUAGCACCUC', 'AUCCUAGCGU', ..., 'C']

I tried this code:

new_lst = []
new_lst.append(''.join([x[i] for x in lst]))

The above code prints only the first 10 elements in the new_list because the index is from 0 to 9 (I misunderstood what index means).

Then I did the following

final= []
for j in range(1,len(lst),1):
 new_lst = []
 for x in lst:
   c = len(x)
    for i in range(1,c,1):
       while (i<len(x)):
          new_lst.append(x[i])
       else:
          new_lst.append("")
 final.append([new_lst])
print final

When I execute this code, it throws a memory error. The reason why I checked the length is because the elements in the lst are not of the same length and when I was using a different code it threw an error, IndexError: string index out of range.

I first wanted to dissect the code, so I just used the following code:

lst2 = []
for x in lst:
 c = len (x)
 print c
  for i in range(0,c,1):
    print i,
    print x[i],

I got the following output:

22
0 U 1 G 2 A 3 G 4 G 5 U 6 A 7 G 8 U 9 A 10 G 11 G 12 U 13 U 14 G 15 U 16       A 17 U 18 A 19 G 20 U 21 U 22
0 C 1 U 2 A 3 U 4 G 5 C 6 A 7 A 8 U 9 U 10 U 11 U 12 C 13 U 14 A 15 C 16  C 17 U 18 U 19 A 20 C 21 C 21
0 U 1 C 2 C 3 C 4 U 5 G 6 A 7 G 8 A 9 C 10 C 11 U 12 C 13 A 14 A 15 G 16 U 17 G 18 U 19 G 20 A 22
0 A 1 C 2 A 3 C 4 C 5 U 6 G 7 G 8 G 9 C 10 U 11 C 12 U 13 C 14 C 15 G 16 G 17 G 18 U 19 A 20 C 21 C 22
0 C 1 A 2 U 3 A 4 C 5 U 6 U 7 C 8 C 9 U 10 U 11 A 12 C 13 A 14 U 15 G 16 C 17 C 18 C 19 A 20 U 21 A 21
0 U 1 G 2 G 3 A 4 A 5 U 6 G 7 U 8 A 9 A 10 A 11 G 12 A 13 A 14 G 15 U 16 A 17 U 18 G 19 U 20 A 22
0 C 1 A 2 U 3 C 4 A 5 A 6 A 7 G 8 C 9 G 10 G 11 U 12 G 13 G 14 U 15 U 16 G 17 A 18 U 19 G 20 U 21 G 23
0 U 1 A 2 U 3 C 4 A 5 C 6 A 7 G 8 C 9 C 10 A 11 G 12 C 13 U 14 U 15 U 16 G 17 A 18 U 19 G 20 U 21 G 22 C 22
0 A 1 G 2 G 3 C 4 A 5 G 6 U 7 G 8 U 9 G 10 G 11 U 12 U 13 A 14 G 15 C 16 U 17 G 18 G 19 U 20 U 21 G 22
0 A 1 C 2 G 3 G 4 C 5 U 6 A 7 C 8 C 9 U 10 U 11 C 12 A 13 C 14 U 15 G 16 C 17 C 18 A 19 C 20 C 21 C

As you can see above the loop goes through the first element, but after extracting the first character from the first element in lst, it goes to the second character in the first element. But I wanted the loop to go through the second element in the list lst. Also, there are elements in the list with unequal lengths, so wondering if there is a way to avoid the IndexError: string index out of range?

I guess I am missing something, it might be too silly. sorry for being naive. If you could please suggest different methods to accomplish the job, it would be awesome. I checked online about using array from the module numpy, but is there a way to do this without numpy?

Upvotes: 2

Views: 115

Answers (2)

John Coleman
John Coleman

Reputation: 52008

You can use itertools.zip_longest:

import itertools
[''.join(chars) for chars in itertools.zip_longest(*lst,fillvalue = '')]

output:

['UCUACUCUAA', 'GUCCAGAAGC', 'AACAUGUUGG', 'GUCCAACCCG', 'GGUCCAAAAC', 'UCGUUUACGU', 'AAAGUGAAUA', 'GAGGCUGGGC', 'UUAGCACCUC', 'AUCCUAGCGU', 'GUCUUAGAGU', 'GUUCAGUGUC', 'UCCUCAGCUA', 'UUACAAGUAC', 'GAACUGUUGU', 'UCGGGUUUCG', 'ACUGCAGGUC', 'UUGGCUAAGC', 'AUUUCGUUGA', 'GAGAAUGGUC', 'UCACUAUUUC', 'UCCAGGGC', 'C']

The built-in zip() and well as the itertools method zip_longest() in Python 3 (or, in Python 2, the itertools methods izip() and izip_longest()) are the tools of choice when you want to process two or more iterables (such as lists, strings, or generators) in parallel. To see the difference between zip() and zip_longest() consider the following:

for chars in zip('ABCD','EFG','HI'):
    print(chars)
print('')
for chars in itertools.zip_longest('ABCD','EFG','HI',fillvalue = ''):
    print(chars)

Output:

('A', 'E', 'H')
('B', 'F', 'I')

('A', 'E', 'H')
('B', 'F', 'I')
('C', 'G', '')
('D', '', '')

the first tuple produced is the tuple of the first elements, the second tuple produced is the tuple of the second elements, etc. zip (or izip) stops as soon as the first iterable is exhausted. In this case it can't return a tuple of the third character in each string since the 3rd input to zip lacks a third character. zip_longest() (or izip_longest()) allows for a fillvalue to take the place of missing items in the shorter iterables once they are exahausted. Here I used the empty string since that disappears when the tuples are joined by ''.

In the above code I hardwired in 3 strings to zip_longest(). For your problem, you would have to explicitly enter 10 inputs, which would be tedious in the extreme, or use the unpacking operator *. If I have a list:

strings = ['ABCD','EFG', 'HI']

Then

for char in itertools.zip_longest(*strings, fillvalue = ''):

is equivalent to

for chars in itertools.zip_longest('ABCD','EFG','HI',fillvalue = ''):

Upvotes: 3

Yevhen Kuzmovych
Yevhen Kuzmovych

Reputation: 12140

You will need to iterate through indices of the longest string:

lst = ['UGAGGUAGUAGGUUGUAUAGUU', 'CUAUGCAAUUUUCUACCUUACC',
       'UCCCUGAGACCUCAAGUGUGA', 'ACACCUGGGCUCUCCGGGUACC',
       'CAUACUUCCUUACAUGCCCAUA', 'UGGAAUGUAAAGAAGUAUGUA', 
       'CAUCAAAGCGGUGGUUGAUGUG', 'UAUCACAGCCAGCUUUGAUGUGC',
       'AGGCAGUGUGGUUAGCUGGUUG', 'ACGGCUACCUUCACUGCCACCC']

max_len = max(len(x) for x in lst) # length of the longest string
new_lst = [ ''.join(x[i] for x in lst if i < len(x)) for i in range(max_len)]

Upvotes: 1

Related Questions