Reputation: 3
I have this data here:
'**Otolemur_crassicaudatus**_/7977-8746 gi|238809369|dbj|**AB371093.1**|':0.00000000,'**Otolemur_crassicaudatus**/7977-8746 gi|238866848|ref|**NC_012762.1**|':
It is all on one line in a .txt
file. I was wondering how I would go about extracting the Names (i.e the Otolemur
and the AB
and NC
numbers (bold) to print to a new file but without all the other columns. This is a tiny, tiny snippet of what I have, and to be able to do this would be such a time saver.
Upvotes: 0
Views: 59
Reputation: 17188
Assuming there's some predictability to the stuff you want to keep, you want a regex of some kind to match the good stuff. Then you can grab your list of match objects and write that all to a new file however you want. I don't what your data looks like well enough to make the regex pattern for you, but the basic conversion looks something like this:
import re
infile = open('input.txt', 'r')
outfile = open('output.txt', 'w')
for line in infile:
# Write each matching piece to its own line in the new file
outfile.write('\n'.join(re.findall('PATTERN', line)))
infile.close()
outfile.close()
Upvotes: 1