Scan through txt, append certain data to an empty list in Python

Question

I have a text file that I am reading in python . I'm trying to extract certain elements from the text file that follow keywords to append them into empty lists . The file looks like this:

enter image description here

so I want to make two empty lists

1st list will append the sequence names
2nd list will be a list of lists which will include be in the format [Bacteria,Phylum,Class,Order, Family, Genus, Species]

most of the organisms will be Uncultured bacterium . I am trying to add the Uncultured bacterium with the following IDs that are separated by ;

Is there anyway to scan for a certain word and when the word is found, take the word that is after it [separated by a ' '] ?

I need it to create a dictionary of the Sequence Name to be translated to the taxonomic data .

I know i will need an empty list to append the names to:

seq_names=[ ]

a second list to put the taxonomy lists into

taxonomy=[ ]

and a 3rd list that will be reset after every iteration

temp = [ ]

I'm sure it can be done in Biopython but i'm working on my python skills

MDT · Accepted Answer

Yes there is a way.

You can split the string which you get from reading the file into an array using the inbuilt function split. From this you can find the index of the word you are looking for and then using this index plus one to get the word after it. For example using a text file called test.text that looks like so (the formatting is a bit weird because SO doesn't seem to like hard tabs).

one two three   four    five    six seven   eight   nine

The following code

f = open('test.txt','r')

string = f.read()

words = string.split('	')
ind = words.index('seven')
desired = words[ind+1]

will return desired as 'eight'

Edit: To return every following word in the list

f = open('test.txt','r')

string = f.read()
words = string.split('	')

desired = [words[ind+1] for ind, word in enumerate(words) if word == "seven"]

This is using list comprehensions. It enumerates the list of words and if the word is what you are looking for includes the word at the next index in the list.

Edit2: To split it on both new lines and tabs you can use regular expressions

import re
f = open('testtest.txt','r')

string = f.read()

words = re.split('	|
',string)

desired = [words[ind+1] for ind, word in enumerate(words) if word == "seven"]

Scan through txt, append certain data to an empty list in Python

Answers (2)

Related Questions