roybatty
roybatty

Reputation: 105

Select and store data from string in python

I got the information of a txt file and store it as lines

print(lines)

['>chr12_9180206_+:chr12_118582391_+:a1;2 total_counts: 115 Seed: 4 K: 20 length: 79\n', 'TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC\n', 'AGGACAGGCCGCTAAAGTG\n', '>chr12_9180206_+:chr12_118582391_+:a2;2 total_counts: 135 Seed: 4 K: 20 length: 80\n', 'CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG\n', 'GCCTGGTAACACGTGCCAGC\n']

If you execute the code

for i in lines:
   print(i)

You get:

>chr12_9180206_+:chr12_118582391_+:a1;2 total_counts: 115 Seed: 4 K: 20 length: 79

TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC

AGGACAGGCCGCTAAAGTG

>chr12_9180206_+:chr12_118582391_+:a2;2 total_counts: 135 Seed: 4 K: 20 length: 80

CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG

GCCTGGTAACACGTGCCAGC

I want to store the sequences that are in caps TTGGTTTCGTGGTTT... as independent elements in an object so you can operate with them, so you would be able to do something like:

seq[1]
>>> TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGCAGGACAGGCCGCTAAAGTG

Upvotes: 0

Views: 72

Answers (5)

dawg
dawg

Reputation: 103834

I would use a regex:

import  re

seq={}
pattern=r'^(>.*$)\n([ACGTU\n]*?)(?=^>|\Z)'
for i,m in enumerate(re.finditer(pattern, ''.join(lines), flags=re.M)):
    seq[i]=m.group(2).replace('\n','')

Then each FASTA seq is mapped to an integer:

>>> seq
{0: 'TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGCAGGACAGGCCGCTAAAGTG', 1: 'CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCGGCCTGGTAACACGTGCCAGC'}

Upvotes: 1

wuerfelfreak
wuerfelfreak

Reputation: 2439

To check wheter a string is caps I woult use mySting == mySting.upper().

To get all caps elements you could use a list comprehension like so:

result = [s for s in lines if lines == lines.upper()]

This would still allow special characters in your string.

If you only want uppercase leters then use lines.isalpha().

result = [s for s in lines if lines == lines.upper() and lines.isalpha()]

Upvotes: 1

Jarvis
Jarvis

Reputation: 8564

You can do this:

lines = list(map(str.strip, (filter(str.isupper, lines))))

Upvotes: 2

Pierre D
Pierre D

Reputation: 26211

gattaca = [x.strip() for x in lines if x.isupper()]

>>> gattaca
['TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC',
 'AGGACAGGCCGCTAAAGTG',
 'CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG',
 'GCCTGGTAACACGTGCCAGC']

Upvotes: 2

DPM
DPM

Reputation: 935

So if you use isupper() you can check if your string in the list is upper case. If True, it means it is.

for i in lines:
   if i.isupper():
      ## store the string

Upvotes: 1

Related Questions