Areza
Areza

Reputation: 6080

extracting number after certain words in a text file

I have a big text file and I would like to extract only numbers that are after certain phrases/words.

There are dozens lines in this huge text file in the following format:

Best CV Model for car: 15778 is order:2 threshold: 0 with AUC of : 0.7185 gene aau_roc: 0.466281

One solution is to just look at number after "for car: X", "is order: X", "threshold: X", "Y gene aau_roc: X" !

At the end I would like to have 15778, 2, 0, 0.7185, 0.466281 for each line.

Upvotes: 5

Views: 3597

Answers (3)

Blckknght
Blckknght

Reputation: 104712

Since you've already tagged your question with regex, I suspect you're already close to a solution. You can write a regular expression pattern that will match all the numbers on your line. Something like:

pattern = r"for car: (\d+) is order:(\d+) threshold: (\d+) with AUC of : ([0-9.]+) gene aau_roc: ([0-9.]+)"

Note, I've made this to exactly match your example string, including some odd spacing around the : characters in a few places. Double check that it actually works with your real data.

To use this to do a search of your text file, I'd use re.finditer to search over the whole text and return an iterable:

import re

for model, order, threshold, auc, aau_roc in re.finditer(pattern, text):
     do_stuff()

Upvotes: 2

PearsonArtPhoto
PearsonArtPhoto

Reputation: 39698

re.match('(?<=for car: )/n*',the_line);

Just keep repeating for the other variables you need, and store them in your desired output.

Upvotes: 0

Steven Rumbalski
Steven Rumbalski

Reputation: 45542

>>> if line.startswith('Best CV Model'):
...     re.findall(r'\d+\.{0,1}\d*', line)
... 
['15778', '2', '0', '0.7185', '0.466281']

Upvotes: 4

Related Questions